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INTRODUCTION: 


We  urgently  require  new  markers  that  are  predictive  of  the  biological  behavior  of  tumors  to  guide  the  types  and 
aggressiveness  of  therapy.  We  have  proposed  to  address  this  challenge  by  the  development  of  a  new  technology 
that  will  exploit  mitochondrial  DNA  mutations  as  novel  biomarkers  for  tumor  progression,  therapeutic 
response,  and  cancer  recurrence.  Our  hypothesis  is  that  an  ultrasensitive  measurement  of  CTC  and  ctmtDNA 
prevalence,  marked  by  homoplasmic  mtDNA  mutations  identical  to  those  in  the  primary  tumor,  will  serve  as 
early  independent  prognostic  indicators  of  tumor  stage,  therapeutic  response,  progression  and  recurrence.  We 
have  two  specific  aims.  In  Aim  1,  we  will  determine  the  rate  and  types  of  somatic  mtDNA  mutations  in 
prostatic  cancers.  In  Aim  2,  we  will  establish  whether  the  prevalence  of  circulating  tumor  mtDNA  can  serve  as 
a  sensitive  marker  of  clinical  stage,  progression,  and  recurrence. 

KEYWORDS: 

•  Biomarker  Discovery 

•  Prostate  Cancer 

•  Cancer  Genetics 

•  Mitochondria 

•  Survivorship 

•  DNA  Mutation 

•  Circulating  tumor  cells  (CTCs) 

•  Mutation  detection 

OVERALL  PROJECT  SUMMARY: 

Mutations  in  mitochondrial  DNA  (mtDNA)  lead  to  a  diverse  collection  of  diseases  that  are  challenging  to 
diagnose  and  treat.  Each  human  cell  has  hundreds  to  thousands  of  mitochondrial  genomes  and  disease- 
associated  mtDNA  mutations  are  homoplasmic  in  nature,  i.e.  the  identical  mutation  is  present  in  a 
preponderance  of  mitochondria  within  a  tissue  (Chatterjee  et  al.,  2006;  Taylor  and  Turnbull,  2005).  Although 
the  precise  mechanisms  of  mtDNA  mutation  accumulation  in  disease  pathogenesis  remain  elusive,  we  have 
documented  multiple  homoplasmic  mutations  from  prostate  tumor  samples.  Indicative  of  their  involvement  in 
tumorigenesis  and  their  potential  utility  as  prognostic  and  predictive  markers,  mtDNA  mutations  identified  in 
prostatic  cancers  by  our  group  (Table  1)  and  previously  by  others  (Petros  et  al.,  2005)  are  predominantly 
nonsynonymous.  However,  in  our  dataset,  no  statistical  correlations  were  found  between  mtDNA  mutation  and 
clinical  significance  with  respect  to  Gleason  score,  PSA  level,  clinical  stage,  recurrence,  therapeutic  response, 
or  progression. 

Mutations  in  prostate  tumors:  discovery  and  monitoring 

We  have  successfully  identified  multiple  homoplasmic  mutations  from  prostate  tumor  samples  (Table  1).  To 
ensure  accurate  identification  of  homoplasmic  tumor  mtDNA  mutations,  patient- matched  normal  peripheral 
blood  cells  and  multiple  pure  prostatic  carcinoma  cells  isolated  with  laser  capture  microdissection  (LCM)  from 
surgically  resected  tumors  (radical  prostatectomies)  were  collected  from  patients  who  have  given  informed 
consent  and  received  no  prior  treatment.  The  entire  mitochondrial  genome  was  sequenced  in  prostatic  cancer, 
adjacent  normal  tissue,  and  blood  samples  from  each  patient  first  by  PCR  amplifying  the  mtDNA  with  28  pairs 
of  primers,  previously  described  (Taylor  et  al.,  2001).  Clonally  expanded  mtDNA  mutations  were  scored  only 
when  the  sequence  of  the  tissue  samples  differed  from  that  of  the  patient-matched  normal  peripheral  blood  cells. 
All  regions  with  detected  mutations  were  reamplified  and  sequenced  to  rule  out  the  possibility  of  the  mutations 
being  produced  by  polymerase  errors  during  the  PCR  or  sequencing  processes.  In  addition,  to  guard  against  the 
sample  mix-up  and  contamination  that  has  confounded  many  mtDNA  mutation  studies  (Salas  et  al.,  2005),  we 
compared  each  patient’s  sequences  to  the  revised  Cambridge  Reference  Sequence  (rCRS)  to  confirm  they 
shared  common  polymorphisms. 
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To  track  these  mutations  using  the  Random  Mutation  Capture  (RMC)  methodology,  we  set  out  to  expand  RMC 
coverage  to  other  mutational  targets  in  accordance  with  our  statement  of  proposed  work.  We  first  identified 
robust  restriction  enzyme  recognition  sequences  that  can  be  accurately  monitored  for  mutation.  Digestion  of 
mtDNA  was  performed  with  the  required  restriction  enzymes  (Table  1)  and  the  efficiency  of  digestion  was 
monitored  by  QPCR  with  primers  that  flank  the  restriction  sites.  However,  suboptimal  restriction  enzyme 
efficiencies  were  observed,  limiting  our  ability  to  detect  point  mutations,  as  compared  to  our  resolution  with 
TaqI.  Alternatively,  given  that  mtDNA  deletions  are  found  in  many  tumors  (Kulawiec  et  al.,  2010)(Table  1),  an 
adaptation  of  the  RMC  technology  would  allow  us  to  utilize  these  mutations  to  track  CTCs  and  ctmtDNA.  As 
such,  we  developed  a  highly  sensitive  tool,  termed  Digital  Deletion  Detection  (3D),  for  detection, 
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quantification,  and  characterization  of  rare  (<  1  per  10  molecules)  deletion  events  in  mtDNA  (Taylor  et  al., 
2013).  3D  utilizes  droplet  digital  PCR  for  direct  enumeration  of  deletion  mutations  and  provides  several  process 
pathways  for  characterization  of  the  diversity  of  deletion  sizes  and  breakpoints  within  the  sample  population. 
Single  molecule  compartmentalization  minimizes  amplification  bias,  allowing  simultaneous  analysis  of  a  large 
range  of  deletion  products  and  more  accurate  sequencing  analysis.  The  assay  can  be  readily  adapted  to 
interrogate  multiple  diverse  regions  within  the  genome. 

3D  assay  design 

Digital  Deletion  Detection  (3D)  is  an  extremely  sensitive  tool  for  the  absolute  quantification  and 
characterization  of  rare  deletion  molecules.  The  basic  strategy  behind  3D  is  a  three-step  process:  enrich, 
amplify,  and  analyze.  The  first  step,  based  on  methods  developed  previously  by  Bielas  and  colleagues,  enriches 
for  deletion-bearing  molecules  and  improves  mutant  specificity  (Bielas  and  Loeb,  2005;  Vermulst  et  al.,  2008a). 
This  step  consists  of  targeted  endonucleolytic  digestion  of  templates  to  selectively  digest  wild- type  (WT) 
molecules,  thus  allowing  the  preferential  PCR  amplification  of  molecules  bearing  an  appropriate  deletion 
(Figure  1A).  After  digestion,  the  DNA  molecules  are  sequestered  into  homogenous  1  nl  water-in-oil  emulsion 
droplets  and  subjected  to  normal  PCR  amplification  (Figure  IB).  The  concentration  of  molecules  within  the 
droplets  is  adjusted  such  that  most  droplets  contain  no  mutant  genomes,  while  a  small  fraction  contains  only 
one.  Thus  a  single  well  in  the  reaction  actually  consists  of  many  thousand  single  molecule  reaction  chambers. 
This  process  allows  each  captured  deletion  to  be  amplified  without  introducing  many  of  the  PCR  artifacts  and 
biases  that  are  common  to  bulk  amplification  reactions  (i.e.  template  switching  and  preferential  amplification  of 
short  templates). 

Following  amplification,  the  deletions  can  be  analyzed  via  two  process  pathways.  In  the  quantification  pathway, 
high  resolution  quantification  of  deletions  is  accomplished  through  the  use  of  droplet  digital  PCR  (ddPCR) 
(Pinheiro  et  al.,  2012).  With  the  inclusion  of  TaqMan  reporter  chemistry,  droplets  bearing  amplified  templates 
are  readily  distinguished  by  their  fluorescence  amplitude  using  a  cytometry  system.  Because  the  droplet 
volumes  are  highly  uniform,  Poisson  statistics  can  be  applied  to  calculate  the  average  number  of  deletion¬ 
bearing  molecules  per  droplet  and  the  absolute  concentration  of  mutant  molecules  determined  with  high 
precision  and  accuracy  (Pinheiro  et  al.,  2012).  Alternatively,  in  the  characterization  pathway,  droplets  are 
disrupted  and  amplicons  recovered.  The  deletions  can  then  be  directly  sequenced  using  high-throughput  or  ‘next 
generation’  sequencing,  or  cloned  for  use  in  Sanger  sequencing  or  other  downstream  applications. 

3D  sensitivity  and  recovery 

Using  the  quantification  process  pathway  of  3D,  we  measured  the  absolute  deletion  frequency  within  a  region 
spanning  the  ND1/ND2  genes  in  mitochondrial  DNA  isolated  from  human  epithelial  cells  in  tissue  culture.  We 
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measured  the  deletion  frequency  to  be  1.6  ±  0.4  deletions  per  ten  million  genomes  (or  1.6  x  10'  per  genome) 
(Figure  2).  We  next  asked  whether  3D  was  able  to  fully  recover  all  of  the  deletions  within  a  sample  over  a  broad 
range  of  deletion  frequencies.  To  address  this  we  performed  a  series  of  reconstruction  experiments.  First,  a 
plasmid  harboring  a  fragment  of  mtDNA  containing  a  known  deletion  in  the  ND1/ND2  region  was  mixed  at  a 
constant  concentration  (3  copies/pl)  against  increasingly  higher  levels  of  genomic  mtDNA  (up  to  2.5  x  106 
copies/pl).  We  then  performed  3D  analysis  to  determine  if  the  low  concentration  of  the  control  molecules  could 
be  accurately  quantified  in  the  presence  of  increasing  concentrations  of  background  DNA  (Figure  2).  This 
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reconstruction  demonstrated  accurate  quantification  of  target  molecules  across  a  range  of  frequencies  spanning 
eight  orders  of  magnitude,  with  sensitive  recovery  at  frequencies  as  low  as  1  x  10  7  per  genome.  Because  we 
reached  the  endogenous  deletion  frequency  of  the  background  DNA,  we  were  unable  to  test  lower  frequencies 
in  the  reconstruction  experiment. 

Capturing  and  analyzing  sample  complexity 

Analysis  of  fluorescence  amplitudes  of  the  three  control  plasmids  following  ddPCR  revealed  that  under  the 
current  conditions,  a  given  template  will  yield  an  average  droplet  fluorescence  intensity  inversely  proportional 
to  the  template  size  (Figure  3).  When  the  three  control  templates  were  combined,  this  effect  led  to  a  striking 
multimodal  distribution  in  the  fluorescence  amplitudes  (Figure  3A).  More  generally,  we  found  that  the  sample 
heterogeneity  is  reflected  in  the  distribution  of  fluorescence  amplitudes  (Figure  3C).  Thus,  the  average 
amplitude  and  distribution  of  the  droplet  fluorescence  can  be  used  to  predict  deletion  sizes  and  complexity  (e.g. 
presence  of  a  single,  clonal  deletion  vs.  a  heterogeneous  population  of  multiple  deletions).  This  discovery  led  to 
the  development  of  QuantiSize  (Laurie  et  al.,  2013),  which  combines  quantification  and  size  determination  in  a 
single  ddPCR  experiment.  With  standard  ddPCR  reagent  concentrations,  DNA  amplification  is  eventually 
limited  by  the  availability  of  dNTPs  and  inhibited  by  the  presence  of  pyrophosphate  (Hori  et  al.,  2007;  Xiao  et 
al.,  2004).  This  means  that  the  amplification  of  long  DNA  templates  consumes  more  dNTPs  and  generates  more 
pyrophosphate,  resulting  in  fewer  products  than  short  templates  at  the  endpoint  of  a  standard  reaction.  Because 
the  final  number  of  products  generated  within  a  droplet  determines  its  level  of  fluorescence,  the  measured 
fluorescence  amplitude  of  droplets  containing  short  templates  will  be  greater  than  that  of  droplets  containing 
long  templates.  The  QuantiSize  assay  exploits  this  fact  to  generate  an  equation  relating  fluorescence  amplitude 
to  amplicon  size  by  using  measurements  of  known  size  standards.  The  equation  describing  the  relationship 
between  fluorescence  amplitude  and  amplicon  size  can  be  used  to  calculate  the  size  of  any  unknown  ddPCR 
template  that  shares  common  primer  and  probe  binding  sites  with  the  size  standards.  Creating  size  standards 
that  have  primer  and  probe  binding  sites  in  common  with  DNA  samples  can  be  accomplished  in  a  number  of 
ways  including  cloning  sample  DNA  into  a  vector  and  appending  adapter  sequences  to  both  the  sample  DNA 
and  size  standards  (Zhang  et  al.,  2003). 

We  created  a  set  of  size  standards  applicable  to  Illumina  NGS  libraries  containing  inserts  ranging  from  25  to 
1000  base  pairs  flanked  by  adapter  sequences  compatible  with  the  Illumina  MiSeq  platform.  A  pair  of  primers 
and  a  fluorescent  TaqMan  probe  were  designed  to  hybridize  to  the  adapter  sequences  such  that  the  length  of 
each  amplicon  is  160  base  pairs  plus  the  length  of  the  insert.  As  the  primers  and  probe  are  specific  to  the  MiSeq 
adapter  sequences,  only  the  adapter- ligated  molecules  that  are  amplifiable  on  the  MiSeq  flow  cell  will  be 
quantified. 

A  ddPCR  experiment  was  performed  with  the  aforementioned  size  standards  in  separate  wells  of  a  96-well 
plate.  Droplets  containing  the  target  (positive)  increased  in  fluorescence  following  amplification  of  the  target 
whereas  droplets  lacking  the  target  (negative)  remained  at  the  background  level  of  fluorescence  (Figure  4A). 
The  distribution  of  droplet  amplitudes  is  consistent  across  most  amplicon  lengths,  but  the  760  and  860  bp 
amplicons  show  a  broader  distribution  of  amplitudes  (Figure  4B).  An  inverse,  linear  correlation  between 
amplicon  size  and  mean  fluorescence  amplitude  was  observed  (R~=0. 99436)  (Figure  4C).  The  equation 
describing  this  correlation  allows  for  the  calculation  of  amplicon  size  given  a  measured  fluorescence  amplitude. 
The  slope  of  this  equation  provides  a  measure  of  the  difference  in  mean  fluorescence  amplitude  that  is  expected 
with  a  given  difference  in  amplicon  size.  Maximizing  the  magnitude  of  this  slope  maximizes  the  resolution  of 
size  standards,  which  is  advantageous  for  the  purpose  of  determining  the  length  of  unknown  amplicons  more 
accurately.  The  size  standards  used  for  QuantiSize  are  highly  analogous  to  the  standards  used  in  gel  and 
capillary  electrophoresis.  The  size  of  unknown  DNA  can  be  determined  by  visually  comparing  the  fluorescence 
amplitude  of  the  size  references  to  that  of  the  unknown  DNA  or  by  entering  the  fluorescence  amplitude  value 
into  the  equation  describing  the  relationship  between  average  fluorescence  amplitude  and  amplicon  size  for  the 
size  standards. 
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The  droplet  reader  software  counts  positive  and  negative  droplets  by  using  a  threshold  of  fluorescence  between 
the  well-defined  populations  of  high  and  low  fluorescence  amplitude  droplets.  For  one  particular  TaqMan  probe 
tested,  the  fluorescence  amplitude  of  droplets  containing  amplicons  larger  than  660  bp  was  too  low  to  reliably 
discriminate  between  positive  and  negative  droplets  when  templates  are  amplified  with  a  one-minute  elongation 
time.  When  this  is  the  case,  the  average  fluorescence  amplitude  for  these  amplicons  cannot  be  calculated. 
Increasing  the  elongation  time  to  two  minutes  increases  the  fluorescence  amplitude  of  all  droplets  containing 
amplifiable  template  (Figure  5).  This  enables  the  acquisition  of  accurate  concentration  and  fluorescence 
amplitude  data  for  longer  templates,  but  the  slope  of  the  relationship  between  amplicon  size  and  fluorescence 
amplitude  is  decreased  (from  m  =  -11.66  to  m  =  -9.12),  which  decreases  the  ability  to  resolve  small  differences 
in  amplicon  size  (Figure  5).  Decreasing  the  elongation  time  to  30  seconds  increases  the  resolution  of  the 
relationship  between  amplicon  size  and  fluorescence  amplitude,  but  prevents  targets  longer  than  460  bp  from 
amplifying  to  the  point  that  they  fluoresce  detectably  above  the  background  fluorescence  (Figure  5).  This  is 
likely  due  to  the  fact  that  longer  products  require  more  time  for  complete  polymerization  of  nascent  strands  to 
occur.  Thus,  there  is  a  tradeoff  between  the  resolution  and  range  of  QuantiSize,  though  the  assay  can  be  easily 
adjusted  to  fit  particular  experimental  needs. 

3D  Summary 

In  order  to  adequately  detect  cle  novo  mtDNA  deletions  and  trace  the  frequency  dynamics,  an  assay  is  needed 
that  can  enrich  for  and  directly  quantify  extremely  rare  deletion  events.  Current  approaches  to  analyzing 
mtDNA  deletions  include  Southern  blotting  (DiMauro  and  Hirano,  1993),  direct  sequencing  (Ameur  et  al., 

2011;  Kato  et  al.,  2011;  Sequeira  et  al.,  2012;  Spelbrink  et  al.,  2000),  and  PCR  amplification  (Kraytsberg  et  al., 
2008).  Sequencing  of  deletions  via  cloning  is  laborious,  time-consuming,  prone  to  cloning  artifacts,  and  allows 
only  the  most  abundant  deletion  types  to  be  analyzed  (Supplementary  Notes  3  and  4).  Massively-parallel  or 
‘next  generation’  sequencing  is  rapidly  becoming  a  preferred  means  for  high-throughput  screening  of  individual 
DNA  molecules.  As  an  example,  Illumina,  Inc.  (San  Diego,  CA)  offers  systems  that  generate  from  17  million 
(MiSeq®)  up  to  3  billion  simultaneous  sequencing  reads  per  run  (HiSeq®)(Liu  et  al.,  2012).  However,  given  a 
relatively  short  read  length  of  less  than  150  bp  and  the  fact  that  the  majority  of  the  reads  will  be  off-target,  this 
remains  insufficient  to  adequately  resolve  mtDNA  deletions  that  occur  at  frequencies  of  less  than  one  in  a 
million  genomes.  Even  assuming  no  off-target  reads,  the  MiSeq®  instrument  would  still  only  yield  about  one 
deletion  in  ten  runs.  It  is  therefore  critical  that  a  selection  step  be  performed  to  limit  the  number  of  off-target 
reads  and  to  enrich  for  deletion-bearing  molecules. 

PCR-based  methods,  including  long-distance  PCR  and  real-time  quantitative  PCR,  are  among  the  most 
frequently  employed  methods  for  both  selection  and  amplification  of  deletions  (Chabi  et  al.,  2003;  Cortopassi 
and  Arnheim,  1990;  He  et  al.,  2002;  Kraytsberg  et  al.,  2008).  Generally  speaking,  these  assays  distinguish  wild- 
type  from  deleted  genomes  through  exploitation  of  differences  in  amplicon  fragment  lengths  and  amplification 
efficiencies.  Given  that  they  do  not  select  for  deleted  molecules  prior  to  amplification,  one  of  the  main 
drawbacks  is  high  background  signal  from  contaminating  wild-type  molecules,  limiting  the  effective  sensitivity. 
Furthermore,  these  bulk  PCR  assays  tend  to  introduce  a  number  of  additional  artifacts  arising  from  preferential 
amplification  of  small  templates  (allelic  preference),  introduction  of  false  deletions  through  template  jumping, 
and  other  PCR  errors  (Kraytsberg  and  Khrapko,  2005).  Real-time  quantitative  PCR  (qPCR)  can  be  quite 
sensitive,  but  its  reliance  on  relative  differences  in  crossing  thresholds  rather  than  direct  quantification  makes  it 
more  suitable  for  measuring  fold  changes  rather  than  absolute  deletion  frequencies  (Chabi  et  al.,  2003;  He  et  al., 
2002).  Digital  PCR  methods,  including  long  single  molecule  PCR  (long  smPCR)  (Guo  et  al.,  2010;  Kraytsberg 
and  Khrapko,  2005)  and  the  random  mutation  capture  assay  developed  for  mtDNA  deletions  (deletion 
RMC)(Vermulst  et  al.,  2008a;  Vermulst  et  al.,  2008b)  achieve  direct  quantification  through  the  use  of  single 
molecule  partitioning  in  96-well  plates.  Partitioning  additionally  serves  to  minimize  artifacts  of  template 
jumping  and  allelic  preference  that  are  common  in  bulk  PCR  reactions  (Kraytsberg  and  Khrapko,  2005). 

Despite  these  advantages,  this  approach  becomes  laborious  and  costly  when  using  the  wells  of  a  multi-well 
plate  as  the  partition,  and  only  a  handful  of  the  most  common  deletions  within  a  sample  are  yielded. 
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The  digital  deletion  detection  (3D)  assay  shows  a  marked  improvement  in  specificity,  sensitivity,  and  accuracy 
over  other  available  methods.  This  is  achieved  via  a  three-step  process  of  selection,  amplification,  and 
characterization  (i.e.  quantification  or  sequencing).  As  with  deletion  RMC,  high  specificity  for  deletion-bearing 
molecules  is  achieved  through  the  destruction  of  WT  template  molecules  by  restriction  endonuclease,  thereby 
selecting  for  and  enriching  mutant  molecules  prior  to  amplification.  Following  enrichment,  partitioning  for 
digital  PCR  amplification  is  performed  through  the  generation  of  up  to  20,000  droplet  partitions,  the  equivalent 
of  over  200  96-well  plates,  within  a  single  reaction  well.  Quantification  is  greatly  facilitated  through  the  use  of 
TaqMan  reporter  probes  and  cytometry,  which  allows  for  rapid  enumeration  of  all  partitions  that  contain  an 
amplifiable  template  and  direct  quantification  of  all  deletions  within  a  sample. 

Mutations  in  tumor  mtDNA  that  do  not  disrupt  restriction  sites 

The  majority  of  detected  homoplasmic  mutations  (Table  1)  did  not  mutate  at  a  known  restriction  enzyme 
recognition  site.  As  such,  to  achieve  our  project’s  goals,  it  was  necessary  to  develop  a  novel  mutation  detection 
assay  (see  2012  updated  SOW)  to  test  whether  the  frequency  of  circulating  homoplasmic  mtDNA  tumor 
mutations  in  patients  with  prostate  cancer  would  be  a  specific  and  sensitive  marker  of  therapeutic  response, 
progression,  and  recurrence.  As  such,  we  began  the  development  of  a  novel  DNA-based  technology,  termed 
Rolling  Cypher  Seq,  which  also  exploits  somatic  mtDNA  mutations  for  the  early  detection  of  disease.  By 
exploiting  advances  in  Next  Generation  Sequencing  (NGS)  technologies,  Rolling  Cypher  Seq  is  expected  to 
permit  the  enumeration  of  any  mutation  with  unparalleled  sensitivity,  regardless  of  its  location  in  the  genome. 

The  identification  of  rare  somatic  mutations  that  are  present  in  a  small  fraction  of  DNA  templates  is  essential 
for  DNA-based  early  detection  methodologies.  Although  massively  parallel  sequencing  instruments  are,  in 
principle,  well-suited  for  this  task,  the  error  rates  associated  with  NGS  are  too  high  to  allow  confident 
identification  of  rare  variants.  For  example,  the  error  rates  vary  from  ~1%  (Nazarian  et  al.,  2010;  Quail  et  al., 
2008)  to  ~0.05%  (Gore  et  al.,  2011;  He  et  al.,  2010)  of  bases  sequenced  with  the  commonly  used  Illumina 
sequencing  instruments.  To  address  this  limitation,  we  have  developed  a  novel  method  for  detecting  rare 
mutations  (<  1  mutant  base  pair  among  108  wild- type  nucleotides)  in  any  target  DNA  molecule.  Our  method 
utilizes  rolling  circle  amplification  (RCA)  on  a  generated  library  of  vectors,  each  containing  unique  double- 
stranded  barcode  pairs  (cyphers).  Primers  used  in  the  RCA  step  are  designed  to  selectively  amplify  the  DNA 
molecule  of  interest.  Since  RCA  copies  from  the  same  circular  template  molecule  with  each  cycle,  it 
circumvents  the  clonal  amplification  of  polymerase  errors  observed  in  successive  PCR  cycles.  Moreover, 
unique  cyphers  (Figure  6A)  flanking  each  copy  of  the  target  molecule  will  allow  us  to  deconvolute  the  NGS 
data  and  accurately  distinguish  between  polymerase  error  artifacts  and  true  mutations  (Figure  6B).  Over  the  last 
year  we  have  been  able  to  successfully  demonstrate  the  utility  of  our  method  to  eliminate  background¬ 
sequencing  errors  (Figure  7).  This  monumental  advance  in  sequencing  resolution  puts  us  in  a  position  to 
enumerate  CTC  (circulating  tumor  cells)  and  ctmtDNA  (circulating  tumor  mtDNA)  in  prostate  cancer  patients 
using  any  mutated  base  pair.  However,  RCA  enrichment  would  still  be  required  to  monitor  for  these  rare 
circulating  mutant  variants  in  blood.  Unfortunately,  while  we  are  making  headway,  our  enrichment  strategy  is 
still  being  optimized;  thus  we  have  been  unable  to  enumerate  these  endpoints  thus  far.  However,  we  have 
institution  support  to  continue  our  assay  development,  we  hope  to  complete  the  stated  project  goals  once  our 
RCA  enrichment  technology  development  is  complete. 

KEY  RESEARCH  ACCOMPLISHMENTS: 

•  Detected  and  characterized  homoplasmic  mutations  in  prostate  cancer 

•  Developed  ultra  sensitive  methods  enumerate  deletions  and  point  mutations  in  DNA 

CONCLUSION: 

We  have  been  able  to  successfully  demonstrate  the  utility  of  new  mutation  assessment  method  to  eliminate 
background-sequencing  errors.  This  monumental  advance  in  sequencing  resolution  puts  us  in  a  position  to 
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enumerate  CTC  (circulating  tumor  cells)  and  ctmtDNA  (circulating  tumor  mtDNA)  in  prostate  cancer 
patients  using  any  mutated  base  pair.  However,  mtDNA  enrichment  is  still  be  required  to  monitor  for  these 
rare  circulating  mutant  variants  in  blood.  Unfortunately,  while  we  are  making  headway,  our  enrichment 
strategy  is  still  being  optimized;  thus  we  have  been  unable  to  enumerate  these  endpoints  thus  far.  However, 
we  have  institution  support  to  continue  our  assay  development,  and  hope  to  complete  the  stated  project  goals 
once  our  RCA  enrichment  technology  development  is  complete.  The  use  of  the  sensitive  mutation  detection 
assays  to  test  whether  the  frequency  of  circulating  homoplasmic  mtDNA  tumor  mutations  in  patients  with 
prostate  cancer  is  expected  to  be  a  specific  and  sensitive  marker  of  prostatic  tumor  therapeutic  response, 
progression,  and  recurrence. 

PUBLICATIONS,  ABSTRACTS,  AND  PRESENTATIONS: 

Peer-Reviewed  Scientific  Journals: 

Simultaneous  digital  quantification  and  fluorescence-based  size  characterization  of  massively  parallel 
sequencing  libraries.  Laurie  MT,  Bertout  JA,  Taylor  SD,  Burton  JN,  Shendure  JA,  Bielas  JH.  Biotechniques. 
2013  Aug;  (2):61-7.  (See  Appendix  1) 

Targeted  enrichment  and  high-resolution  digital  profiling  of  DNA  deletions  in  mitochondria.  Taylor  SD, 
Ericson  NG,  Burton  JN,  Prolla  TA,  Silber  JR,  Shendure  J,  Bielas  JH.  Aging  Cell.  2013  Aug;  [Epub  ahead  of 
print] .  (See  Appendix  2) 

Presentations: 

08/2013  Mitochondrial  DNA  Mutagenesis:  Insight  into  Human  Aging,  Carcinogenesis,  and  Novel 

Anticancer  Therapies,  Ellison  Medical  Foundation  Colloquium  on  the  Biology  of  Aging,  Woods 
Hole,  MA 

05/2013  Mechanism  and  Clinical  Utility  of  Nuclear  and  Mitochondrial  DNA  Mutations  in  Cancer, 

Biochemistry  and  Molecular  Biology,  Faculty  of  Medicine,  Dalhousie  University,  Halifax,  NS, 
Canada 

03/2013  Mechanism  and  Clinical  Utility  of  Nuclear  and  Mitochondrial  DNA  Mutations  in  Cancer,  Irish 
Association  for  Cancer  Research  Annual  Meeting,  Dublin,  Ireland 

12/2012  Digital  Detection  of  Rare  Mutations,  First  Annual  Droplet  Digital  PCR  User  Meet,  Boston,  MA 

10/2012  Digital  Detection  of  Rare  Mutations  and  Tumor  Infiltrating  T  Cells:  Clinical  Application  in 

Cancer  and  Disease,  Digital  PCR  Applicationas  and  Advances,  Cambridge  Healthtech  Institute, 
San  Diego,  CA 

10/2012  Mechanism  and  Clinical  Utility  of  Nuclear  and  Mitochondrial  DNA  Mutations  in  Cancer, 
External  Scientific  Advisory  Board  Meeting,  Fred  Hutchison  and  University  of  Washington 
Cancer  Consortium,  Seattle,  WA 

09/2012  Metabolism  and  Mitochondrial  Mutagenesis  in  Human  Colorectal  Cancer,  Metabolism  and 
Metabolites  Symposium,  Fred  Hutchinson  Cancer  Research  Center,  Seattle,  WA 

07/2012  Nuclear  and  Mitochondrial  DNA  Mutations:  Mechanisms  and  Disease,  Meeting  of  the 

Outstanding  New  Environmental  Scientist  (ONES)  Grantee  Forum,  NIEHS,  Research  Triangle 
Park,  NC 

INVENTIONS,  PATENTS  AND  LICENSES: 
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12-032,  Provisional  Applications  61/654,236  filed  6/1/2012  and  61/783^15  filed  on  3/14/2013  (the  priority 
patent  applications)  and  the  subsequent  international  patent  application  PCT/US201 3/043 158  filed  5/29/2013 
(“Compositions  and  Methods  for  Detecting  Rare  Nucleic  Acid  Molecule  Mutations”)  (the  patents  claiming 
digital  PCR  for  quantitation  of  DNA  for  sequencing)  all  include  the  following  statement: 

“STATEMENT  OF  GOVERNMENT  INTEREST” 

This  was  funded  in  part  by  U.S.  Department  of  Defense/Congressionally  Directed  Medical  Research 
Programs  Grant  No.  W81XWH-10-1-0563  and  by  National  Institute  of  Environmental  Health  Sciences  R01 
Grant  ES019319.  The  government  has  certain  rights  in  this  invention.” 

REPORTABLE  OUTCOMES: 

•  Detected  and  characterized  homoplasmic  mutations  in  prostate  cancer 

•  Developed  ultra  sensitive  methods  enumerate  deletions  and  point  mutations  in  DNA 

Both  the  above-mentioned  outcomes  should  permit  the  measure  of  CTC  and  ctmtDNA  prevalence,  to  advance 
the  understanding,  prevention,  diagnosis,  prognosis,  treatment  of  prostate  cancer. 


OTHER  ACHIEVEMENTS: 

Nothing  to  Report. 
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Figure  1.  Overview  of  Digital  Deletion  Detection  (3D).  (A)  Enrichment  of  deletion-bearing  molecules.  WT 
molecules  harbor  endonuclease  recognition  sites  within  the  target  region.  Upon  digestion,  the  target  is  cleaved, 
making  the  WT  molecule  unsuitable  as  a  template  for  PCR  amplification.  In  contrast,  mutant  molecules  that 
harbor  deletions  and  remove  the  restriction  recognition  sites  are  resistant  to  digestion.  These  molecules  serve  as 
templates  for  PCR  amplification.  The  presence  of  the  TaqMan®  hydrolysis  probe  allows  for  the  detection  and 
enumeration  of  each  molecule  in  the  sample  bearing  the  appropriate  deletion.  (B)  Mutant  target  molecules 
(depicted  as  an  idealized,  unbroken  circular  mitochondrial  chromosome)  are  individually  sequestered  into  1  nL 
water-in-oil  droplets  along  with  TaqMan®  PCR  chemistry  and  target- specific  TaqMan®  probes.  Droplets  are 
thermally  cycled.  Because  the  average  number  of  molecules  per  droplet  is  less  than  one,  positive  droplets  (green 
droplets)  represent  individual  reaction  vessels  for  single-molecule  quantitative  PCR  amplification.  Droplets  are 
individually  scanned  and  scored  as  positive  or  negative,  thus  providing  a  digital  quantification  of  all  deletion¬ 
bearing  molecules  within  the  sample.  Alternatively,  droplets  can  be  disrupted  and  the  amplification  products 
subjected  to  physical  characterization,  for  example  cloning,  sequencing,  or  other  applications. 
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Figure  2 
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Figure  2.  Sensitivity  and  recovery.  3D  was  performed  on  7dr//-digested  HCT  116  mtDNA  using  primers  and 
probes  for  the  human  ND1/ND2  site  to  give  the  endogenous  deletion  frequency  (empty  circles).  Reconstruction 
experiments  were  performed  by  spiking  in  3  molecules  pL  1  of  a  control  plasmid  bearing  a  portion  of  the 
human  mitochondrial  genome  with  a  known  deletion  (3534A997)  into  a  serial  dilution  series  of  7V/r//-digested 
HCT  116  mtDNA  (filled  circles).  The  predicted  deletion  frequency  is  plotted  against  the  measured  deletion 
frequency.  Each  data  point  represents  an  individual  experiment.  The  reconstruction  data  were  fit  to  y  =  x  (dotted 
line)  with  a  correlation  coefficient  R~  =  0.9942. 
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Figure  3 
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Figure  3.  Effects  of  sample  heterogeneity  on  3D  analysis.  (A)  Three  plasmid  controls  (3534A997,  3719A809, 
and  3871A492)  were  diluted  to  an  expected  concentration  of  300  molecules  pL" 1  template  and  subjected  to  3D 
analysis,  either  individually  or  combined.  Blue  dots  represent  droplets  whose  amplitudes  are  above  the 
threshold  (‘positives’),  while  gray  droplets  are  those  whose  amplitudes  are  below  the  threshold  value 
(‘negatives’).  (B)  Measured  deletion  concentration  for  individual  and  combined  templates.  Error  bars  indicate 
the  Poisson  95%  confidence  intervals  for  each  concentration  determination.  (C)  Box  and  whisker  plot  showing 
the  distribution  of  positive  droplets  for  each  template.  When  used  as  a  template  in  PCR,  each  plasmid  yields 
different  size  fragments  (185  bp,  372  bp,  and  686  bp,  respectively).  There  is  an  inverse  relationship  between 
average  fluorescence  amplitude  and  template  length,  as  well  as  a  relationship  between  the  sample  complexity 
and  the  breadth  of  the  distribution  of  positive  droplets. 
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Figure  4 
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Figure  4.  ddPCR  amplification  of  10  size  standards  designed  for  use  with  the  QuantiSize  assay.  All  size 
standards  were  amplified  in  parallel  with  standard  reagent  and  thermal  cycling  conditions.  (A)  Scatter  plot  of 
fluorescence  amplitude  of  individual  droplets  for  each  size  standard.  Droplets  whose  fluorescence  amplitude  is 
above  a  specified  threshold  (“positives”)  are  shown  in  black  and  droplets  with  fluorescence  amplitude  below  the 
threshold  (“negatives”)  are  shown  in  grey.  (B)  Box-and-whisker  plots  showing  distribution  of  fluorescence 
amplitudes  of  positive  droplets.  Florizontal  bars  mark  the  mean  fluorescence  amplitude,  boxes  mark  the 
interquartile  range,  and  whiskers  mark  the  95%  confidence  interval.  (C)  Plot  of  mean  fluorescence  amplitude  ± 
SEM  versus  amplicon  size  showing  a  linear  correlation  (R"=0.9943). 
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Figure  5 
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Figure  5.  Effect  of  ddPCR  elongation  time  on  the  relationship  between  fluorescence  amplitude  ±  SEM  and 
amplicon  size.  Three  ddPCR  experiments  were  carried  out  with  the  same  size  standards  using  0.5,  1,  and  2 
minute  elongation  times  during  droplet  thermal  cycling.  With  a  0.5  minute  elongation  time  (blue),  the  slope  of 
the  regression  line  relating  fluorescence  amplitude  to  amplicon  size  was  -13.760  (R'=0.9905).  With  a  1  minute 
elongation  time  (red),  the  slope  was  -11.460  (R  =0.9906).  With  a  2  minute  elongation  time  (green),  the  slope 
was  -9.123  (R  =0.9975).  As  the  magnitude  of  the  slope  of  the  relationship  between  fluorescence  amplitude  and 
amplicon  size  increases,  so  does  the  ability  to  accurately  resolve  small  differences  in  amplicon  size.  Larger 
templates  require  longer  elongation  times  for  positive  droplets  to  fluoresce  discernibly  above  the  background 
level  of  droplet  fluorescence. 
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Figure  6 
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Figure  6  (a)  Data  generated  from  5.1  million  reads  in  a  single  NGS  run  on  our  MiSeq®  demonstrates  optimal 
coverage  and  diversity  at  the  upstream  seven  base  pair  cypher  in  our  vector  library  (b)  Rolling  Cypher  Seq 
eliminates  errors  introduced  during  library  preparation  and  sequencing.  By  ligating  our  target  into  our  vector 
containing  double- stranded  cyphers,  grouping  all  reads  with  identical  cyphers  and  their  reverse  complements 
into  families  and  creating  a  consensus  sequence,  we  can  computationally  eliminate  errors  introduced  during 
library  preparation  (yellow  circles)  and  during  sequencing  (blue  and  purple  circles).  Only  mutations  that  are 
present  in  all  reads  (red  circles)  from  the  same  cypher  and  its  reverse  complement  will  be  counted  as  true. 
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Figure  7 
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Figure  7  Exon  4  of  TP53  was  TOPO-cloned,  transformed  into  E-coli,  mini-prepped,  and  sequenced  by  Sanger 
sequencing.  Wild-type  Exon  4  was  then  ligated  into  a  library  of  Cypher  Seq  vectors  and  sequenced  on  the 
Illumina  MiSeq  instrument  with  a  depth  of  over  a  million.  Sequences  were  then  compared  to  wild-type  TP53 
sequence  and  substitutions  were  plotted  before  (left  panel)  and  after  correction  (right  panel)  with  Cypher  Seq. 
Remaining  base  substitutions  most  likely  reflect  errors  introduced  during  replication  in  E-coli  prior  to  ligation 
into  the  barcoded  vectors. 


16 


Table  1:  Clonal  mtDNA  mutations  identified  in  patient-matched  tissue  and  blood. 


Patient  ID 

Gleason  Score0 

Geneb 

DNA° 

Protein0 

Tumor  (%)e 

Normal  (%)' 

Blood  (%)8 

Restriction  Siteh 

23300B 

3+3 

ND5 

13913T  >C 

L526P 

50 

N.D. 

N.D. 

Hpy188l 

22871 B 

3+3 

ATPase6 

9038T  >C 

M171T 

50 

N.D. 

N.D. 

Fatl,  CviAII,  Nlalll,  HpyCH4V 

D-loop 

16092T  >C 

40 

70 

80 

23388H 

3+3 

23529W 

3+3 

16S rRNA 

2107G>A 

100 

N.D. 

N.D. 

Mnll 

23570B 

3+3 

ND4 

11256A>G 

Y166C 

N.D. 

60 

N.D. 

23481 P 

3+3 

23002C 

3+4 

231 59R 

3+4 

D-loop 

16145G>A 

100 

20 

10 

22896V 

3+4 

23204T 

3+4 

23378W 

3+4 

ND1 

3643G>A 

V1 1 31 

60 

N.D. 

N.D. 

Hpyl  6611 

23390H 

3+4 

235690 

3+4 

ND3 

10228T  >C 

L57S 

100 

N.D. 

N.D. 

23036D 

4+3 

ND5 

13525G>A 

E397K 

100 

N.D. 

60 

Taql 

23171D 

4+3 

D-loop 

313delCCCCGCTTCT 

50 

N.D. 

50 

Acil,  Bsll,  Faul 

ND2 

4752T  >C 

S95P 

100 

N.D. 

50 

D-loop 

16093T  >C 

100 

60 

80 

22962M 

4+3 

D-loop 

309delC 

50 

N.D. 

N.D. 

22959H 

4+3 

23253G 

4+3 

Cyt  b 

15750T  >C 

L335P 

100 

N.D. 

N.D. 

22927H 

4+3 

D-loop 

251G>A 

70 

N.D. 

N.D. 

Cyt  b 

14846G>A 

G34S 

100 

N.D. 

N.D. 

22888L 

4+3 

D-loop 

16027T>C 

100 

N.D. 

40 

19575Y 

3+3 

22951A 

3+3 

22949P 

3+3 

tRNAE 

14724G>A 

50% 

N.D. 

N.D. 

231270 

3+3 

22904L 

3+3 

ND1 

3982G>A 

A226T 

80% 

N.D. 

N.D. 

CviKI-1 

Cyt  b 

15345T>C 

L200S 

70% 

N.D. 

N.D. 

HpyCH4V 

22880H 

3+3 

D-loop 

234A>G 

100% 

40% 

40% 

COXIII 

9942G>A 

D246N 

50% 

N.D. 

N.D. 

23024K 

3+3 

23270S 

3+3 

22905A 

3+4 

19416E 

3+4 

Cyt  b 

14774insC 

frameshift 

100% 

N.D. 

N.D. 

22860F 

3+4 

D-loop 

523delAC 

100% 

N.D. 

N.D. 

12S  rRNA 

1282G>A 

80% 

N.D. 

N.D. 

COXII 

8269G>A 

100% 

20% 

30% 

Sfcl 

ND3 

1 0320A>G 

188V 

100% 

N.D. 

N.D. 

23027B 

3+4 

COXI 

6131A>G 

60% 

N.D. 

N.D. 

Hpyl  881 

COXI 

6910C>T 

A336V 

80% 

N.D. 

N.D. 

Tsel,  ApeKI,  Fnu4HI,  Sfcl 

22870S 

3+4 

D-loop 

16171G>A 

60% 

N.D. 

N.D. 

22975M 

3+4 

23201 A 

3+4 

ND3 

10264T>C 

I69T 

70% 

N.D. 

N.D. 

Tsp509l 

23028W 

3+4 

12S  rRNA 

1 547insT 

100% 

70% 

50% 

23266N 

3+4 

2331 2L 

3+4 

23168H 

4+3 

ND5 

12473T>C 

I46T 

80% 

N.D. 

N.D. 

23183S 

4+5 

ND5 

13480G>A 

G382stop 

80% 

N.D. 

N.D. 

Cyt  b 

14798T>C 

F18L 

50% 

N.D. 

N.D. 

2346 IT 

4+3  partially  5 

12S  rRNA 

1440G>A 

N.D. 

50 

N.D. 

Ddel 

0  Tumors  were  graded  based  on  the  Gleason  Grading  System,  with  the  first  number  indicating  the  grade  of  the  majority  (>50%)  of  the  tumor  (on  a  scale  from  1-5), 
and  the  second  number  signifying  the  grade  of  the  minority  (<50%,  but  >5%)  of  the  tumor. 
b  The  region  or  gene  in  the  mitochondrial  genome  where  the  mutation  was  detected  by  Sanger  sequencing 

c  The  mutation  is  indicated  by  the  base  position  in  mtDNA,  followed  by  the  type  of  change.  AT  to  C  substitution  at  position  1000  would  be  described  as  1000T>C, 
while  a  deletion  of  a  G  at  position  500  would  be  500delG. 

d  Amino  acid  change  resulting  from  the  mutation,  indicated  by  the  original  amino  acid  followed  by  the  position  of  the  residue,  and  then  the  resulting  amino  acid. 

°  Prevalence  of  mutation  in  LCM  cancerous  prostate  tissue  as  a  percentage,  based  on  Sanger  sequencing  chromatogram  reads. 

N.D.  indicates  that  the  mutation  was  not  detectable  via  Sanger  sequencing. 

'  Prevalence  of  mutation  in  LCM  normal  prostate  tissue  as  a  percentage,  based  on  Sanger  sequencing  chromatogram  reads. 

8  Prevalence  of  mutation  in  blood  DNA  as  a  percentage,  based  on  Sanger  sequencing  chromatogram  reads. 
h  If  the  mutation  disrupts  a  restriction  site,  the  corresponding  restriction  enzymes  are  listed. 
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Simultaneous  digital  quantification  and  fluorescence-based 
size  characterization  of  massively  parallel  sequencing  libraries 
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Keywords:  droplet  digital  PCR;  absolute  quantification;  next-generation  sequencing;  massively  parallel  sequencing;  library  preparation;  quality  control;  size  determination 


Due  to  the  high  cost  of  failed  runs  and  subop  timal  data  yields,  quantification  and  determination  of  fragment  size  range  are 
crucial  steps  in  the  library  preparation  process  for  massively  parallel  sequencing  (or  next-generation  sequencing).  Current 
library  quality  control  methods  commonly  involve  quantification  using  real-time  quantitative  PCR  and  size  determination 
using  gel  or  capillary  electrophoresis.  These  methods  are  laborious  and  subject  to  a  number  of  significant  limitations  that 
can  make  library  calibration  unreliable.  Herein,  we  propose  and  test  an  alternative  method  for  quality  control  of  sequencing 
libraries  using  droplet  digital  PCR  (ddPCR).  By  exploiting  a  correlation  we  have  discovered  between  droplet  fluorescence 
and  amplicon  size,  we  achieve  the  joint  quantification  and  size  determination  of  target  DNA  with  a  single  ddPCR  assay.  We 
demonstrate  the  accuracy  and  precision  of  applying  this  method  to  the  preparation  of  sequencing  libraries. 


Massively-parallel  next-generation  sequencing 
(NGS)  technology  is  rapidly  revolutionizing 
the  fields  of  genomics,  molecular  diagnostics, 
and  personalized  medicine  through  the  increas¬ 
ingly  efficient  and  economical  generation  of 
unprecedented  volumes  of  data  (1-7).  A 
common  characteristic  of  the  various  commer¬ 
cially  available  NGS  technologies  is  the  need  to 
load  a  precise  number  of  viable  DNA  library 
molecules  onto  the  instrument  to  optimize 
the  yield  of data  from  an  individual  sequencing 
experiment  (8-11).  Performing  a  sequencing 
run  with  either  too  many  or  too  few  library 
molecules  results  in  compromised  data  yields 
or  completely  failed  sequencing  runs  that  waste 
sample,  expensive  reagents,  user  time,  and 
instrument  time.  Similarly,  if  library  molecules 
are  not  the  appropriate  length  to  fully  utilize  the 
capabilities  of  the  sequencing  platform,  fewer 
bases  can  be  sequenced  in  an  NGS  run  and  the 
throughput  is  wasted.  Thus,  accurate  quantifi¬ 
cation  and  size  determination  of  library  DNA 
is  essential  for  achieving  optimal  data  yield 
and  maximizing  a  laboratory’s  efficiency  and 
sequencing  throughput. 


Protocols  for  the  preparation  of  NGS 
libraries  include  quality  control  steps  to  validate 
the  size  and  concentration  of  amplifiable  library 
molecules  (i.e.,  molecules  properly  ligated  to 
NGS  adapter  sequences)  before  committing 
to  a  sequencing  run.  Manufacturers  typically 
recommend  quantification  with  quantitative 
real-time  PCR  (qPCR)  and  size  determination 
with  gel  or  capillary  electrophoresis.  It  is  also 
possible  to  enumerate  library  DNA  using  U  V 
spectrophotometry,  the  Quant-iT  PicoGreen 
assay,  or  the  Agilent  BioAnalyzer,  but  these 
methods  are  not  ideal  because  they  quantify 
amplifiable  and  non-amplifiable  molecules 
equally  (12-14).  These  methods  are  also  only 
capable  of  measuring  mass  per  volume,  which 
must  be  converted  to  copy  number  using  an 
estimated  average  size  oflibrary  molecules  which 
can  introduce  further  error  (15).  Although  qPCR 
is  widely  considered  the  best  option  for  library 
quantification,  there  are  considerable  drawbacks 
to  the  method,  including  amplification  biases  due 
to  template  size  and  GC-content  as  well  as  the 
need  for  a  standard  curve  to  estimate  the  absolute 
quantity  ofDNA  (16).  Creating  a  standard  curve 


for  each  sample  to  be  analyzed  is  a  difficult  and 
uncertain  process  that  leads  to  inaccuracies  in 
measurements  of  absolute  target  quantity  (15 , 17). 
When  intercalating  dyes  are  used  for  quantifi¬ 
cation,  the  concentration  reading  can  include 
non-amplifiable  DNA  as  dyes  measure  dsDNA 
indiscriminately.  Because  of  these  potential 
inaccuracies,  some  NGS  platform  manufac¬ 
turers  recommend  performing  titration  runs 
on  their  instrument  to  determine  the  proper 
loading  amount.  The  high  cost  of  reagents  and 
the  length  ofNGS  runs  make  this  an  expensive 
and  time-consuming  step. 

We  have  developed  a  new  assay  capable  of 
concurrently  measuring  the  absolute  concen¬ 
tration  and  length  of  unknown  amplifiable 
DNA  templates,  makingit  well  suited  for  quality 
control  ofNGS  libraries.  The  assay,  which  we 
have  termed  QuantiSize,  is  based  on  the  previ¬ 
ously  validated  droplet  digital  PCR  (ddPCR) 
absolute  quantification  system  (18,19)  and  adds 
the  ability  to  calculate  the  size  of  target  DNA  by 
exploiting  a  linear  correlation  we  have  discovered 
between  the  fluorescence  amplitude  of  ddPCR 
droplets  and  the  size  of  amplicons  within  them. 


Method  summary: 

QuantiSize  allows  for  the  determination  of  absolute  quantity  and  size  distribution  of  target  DNA  molecules  in  a  single  experiment. 
This  assay  exploits  a  correlation,  reported  herein,  between  the  length  of  an  amplified  DNA  molecule  and  the  fluorescence  amplitude 
produced  in  droplet  digital  PCR  (ddPCR),  to  allow  the  user  to  calculate  the  size  of  unknown  DNA.  As  ddPCR  simultaneously 
measures  the  concentration  of  target  DNA,  the  user  can  accurately  determine  the  target  population  size  and  quantity  in  a  single  step. 
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As  a  quantification  method,  ddPCR  has  demonstrated  greater  precision 
and  sensitivity  than  real-time  PCR  (18).  We  demonstrate  that  QuxmtiSize 
accurately  measures  the  size  and  concentration  of  target  DNA  simultane¬ 
ously  while  avoiding  the  limitations  of  other  quantification  systems  and,  we 
highlight  the  utility  of  this  assay  for  preparation  of  NGS  libraries. 


Materials  and  methods 

Purification  of  DNA  size  standards 

AnexACTGene  50bpDNALadder  (Fisher,  Waltham,  M  A)  and  1  kb 
Plus  DNA  Ladder  (Fisher)  were  run  on  a  1.0%  UltraPure  Low-Melting 
Point  Agarose  (Invitrogen,  Carlsbad,  CA)  electrophoresis  gel  and  the  25, 
50, 100, 200, 300,400,  500, 600, 700, 800,  and  1000  bp  ladder  bands 
were  manually  excised.  The  DNA  in  these  gel  slices  was  purified  using 
the  QI  Acube  automated  gel  extraction  protocol  with  the  QI  Aquick  Gel 
Extraction  Kit  (QI  AGEN,  Hilden,  Germany).  The  size  and  purity  of  all 
DNA  fragments  were  verified  by  gel  electrophoresis. 

Ligation  of  DNA  size  standards  to  adapter  sequences 

The  DNA  fragments  were  ligated  to  the  following  sequences  containing 
the  Nextera  version  1  adapters  (EpicenterBiotechnologies,  Madison,  WI): 

Adapter  1: 

5'-CAAGCAGAAGACGGCATACGAGATTCGCCT- 
TACGGTCTGCCTTGCCAGCCCGCTCAGA- 
GATGTGTATAAGAGACAG[index  l]CCC-3' 

Adapter  2: 

5'-GGG[index2]CTGTCTCTTATACACATCTCTGATGGC- 

GCGAGGGAGGCGCGATCTAGTGTAGATCTCGGTGGTC- 

GCCGTATCATT-3' 

All  size  standard  fragments  were  ligated  to  adapters  with  the  following 
7  bp  indices: 

Index  1:  5-  TACCTCT-3' 

Index  2:  5-ACACATT-3' 

Ligations  were  carried  out  in  20  [xL  reactions  containing  1  jxL  T4  DNA 
Ligase  and  2  [xL  lOx  T4  DNA  Ligase  Buffer  (New  England  BioLabs, 
Ipswich,  M  A).  The  ligation  reactions  were  incubated  at  room  temperature 
for  2  h.  The  ligated  DNA  was  purified  with  a  phenol/chloroform/isoamyl 
alcohol  extraction.  All  samples  were  sent  to  the  Fred  FFutchinson  Cancer 
Research  Center  ABI  capillary  sequencing  facility  to  verify  that  the  correct 
insert  had  been  ligated  to  the  adapters  in  each  sample. 

Library  preparation  for  Ulumina  MiSeq 

Samples  of  the  plasmid  pET-23a  were  sheared  to  an  average  size  of  150  bp 
usingthe  Covaris  S220  Ultrasonicator  (Covaris,  Woburn,  MA).  The  sheared 
DNA  was  run  on  a  1.0%  UltraPure  Low-Melting  Point  Agarose  gel  and  a 
gel  slice  corresponding  to  ~100-200  bp  was  manually  excised  and  purified 
using  the  QI  Acube  gel  extraction  protocol.  The  sheared  DNA  was  blunted 
and  phosphorylated  using  the  Quick  Blunting  Kit  (New  England  BioLabs) 
and  purified  with  aphenol/chloroform/isoamyl  alcohol  extraction. 

Eight  samples  of  sheared  DNA  were  ligated  to  adapter  sequences  with 
unique  7  bp  indices  in  separate  20  [xL  reactions  containing  1  fxL  T4  DNA 
Ligase  and  2  [xL  lOx  T4  DNA  Ligase  Buffer.  The  ligation  reactions  were 
incubated  at  room  temperature  for  2  h.  All  ligations  were  purified  with  a 
phenol/chloroform/isoamyl  alcohol  extraction. 

All  ligated  libraries  were  amplified  with  20  cycles  of  standard  PCR 
using  the  following  primer  sequences: 

Primer  1: 

5-AATGATACGGCGACCACCGA-3' 

Primer  2: 

5-CAAGCAGAAGACGGCATACGA-3' 
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Figure  1.  dcIPCR  amplification  of  10  size  standards  designed  for  use  with  the 
QuantiSize  assay.  All  size  standards  were  amplified  in  parallel  with  standard 
reagent  and  thermal  cycling  conditions.  (A)  Scatter  plot  of  fluorescence  am¬ 
plitude  of  individual  droplets  for  each  size  standard.  Droplets  whose  fluores¬ 
cence  amplitude  is  above  a  specified  threshold  (“positives")  are  shown  in 
black  and  droplets  with  fluorescence  amplitude  below  the  threshold  (“nega¬ 
tives")  are  shown  in  gray.  (B)  Box-and-whisker  plots  showing  distribution  of 
fluorescence  amplitudes  of  positive  droplets.  Horizontal  bars  mark  the  mean 
fluorescence  amplitude,  boxes  mark  the  interquartile  range,  and  whiskers 
mark  the  95%  confidence  interval.  (C)  Plot  of  mean  fluorescence  ampli¬ 
tude  ±  SEM  vs.  amplicon  size  showing  a  linear  correlation  (R2  =  0.9943). 
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Figure  2.  Effect  of  ddPCR  elongation  time  on  the  relationship  between  fluores¬ 
cence  amplitude  ±  SEM  and  amplicon  size.  Three  ddPCR  experiments  were  car¬ 
ried  out  with  the  same  size  standards  using  0.5, 1,  and  2  min  elongation  times  dur¬ 
ing  droplet  thermal  cycling.  With  a  0.5  min  elongation  time  (blue),  the  slope  of  the 
regression  line  relating  fluorescence  amplitude  to  amplicon  size  was  -13.760  (R2  = 
0.9905).  With  a  1  min  elongation  time  (red),  the  slope  was  -11.460  (R2  =  0.9906). 
With  a  2  min  elongation  time  (green),  the  slope  was  -9.123  (R2  =  0.9975).  As  the 
magnitude  of  the  slope  of  the  relationship  between  fluorescence  amplitude  and 
amplicon  size  increases,  so  does  the  ability  to  accurately  resolve  small  differences 
in  amplicon  size.  Larger  templates  require  longer  elongation  times  for  positive  drop¬ 
lets  to  fluoresce  discernibly  above  the  background  level  of  droplet  fluorescence. 


The  amplified  libraries  were  quantified  using  the  ddPCR  system 
(Bio-Rad,  Hercules,  CA)  and  the  Quant-iT  PicoGreen  assay  (Invit- 
rogen).  In  the  ddPCR  experiment,  the  libraries  were  run  in  parallel  with 
adapter-ligated  size  standards  to  allow  for  the  estimation  of  library  size 
distribution.  The  measured  concentrations  of  the  8  differently  indexed 
libraries  were  used  to  dilute  and  combine  the  libraries  in  a  molar  ratio  of 
100:50: 10: 1  with  2  libraries  at  each  concentration.  The  combination  of 
libraries  was  denatured  and  diluted  in  preparation  for  loading  onto  the 
MiSeq  flow  cell  (Illumina,  San  Diego,  C  A)  as  per  the  Illumina  protocol. 

Genomic  DNA  was  purified  from  the  human  colon  cancer  cell  line 
HCT  116  using  the  QI  Acube  automated  purification  protocol  with  the 
DNeasy  Kit  (QIAGEN).  The  NexteraXT  DNA  Sample  Preparation  Kit 
(Illumina)  was  used  to  generate  a  MiSeq  compatible  library  from  the  HCT 
116  DNA.  The  optional  bead-based  normalization  step  in  the  Nextera  XT 
protocol  was  omitted  and  the  library  was  instead  normalized  by  quantifi¬ 
cation  with  ddPCR  and  the  volume  was  adjusted  to  2  nM  as  required  by 
the  MiSeq  loading  protocol.  The  2  nM  library  was  denatured  and  further 
diluted  as  per  the  manufacturer  s  guidelines.  The  standard  phi  X  control 
library  (Illumina)  was  spiked  into  the  denatured  HCT  116  library  at  5% 
by  volume.  The  library  and  phi  X  mixture  were  then  loaded  into  a  MiSeq 
300-Cycle  v2  Reagent  Kit  (Illumina). 

TaqMan  probe  and  primer  design 

The  following  primers  (Invitrogen)  and  TaqMan  probe  (Applied 
Biosystems,  Foster  City,  CA)  were  designed  to  hybridize  to  the  Nextera 
adapter  sequences: 

Primer  1: 

5-GCGACCACCGAGATCTACAC-3' 

Primer  2: 

5 '- AG  C  AG  A  AG  AC  G  G  C  AT ACG  AG  -3 ' 

Probe: 

5  -FAM-  CT  GT+  CT+  CT+TA+TA+  C  A+  CAT  C  -I  BF  Q-3 ' 

The  V  indicates  that  the  previous  base  is  a  locked  nucleic  acid  (TNA)  base. 


Droplet  digital  PCR 

All  quantified  DNA  libraries  and  size  standards  were  prepared  for  droplet 
PCR  in  25  [zL  reactions  containing  2x  ddPCR  Master  Mix  (Bio-Rad), 
25 0  nM  TaqMan  probe,  900  nM  each  of  the  appropriate  flankingprimers, 
and  ~10,000 copies  oftarget  DNA.  Emulsified  1  nF  reaction  droplets  were 
made  by  adding  20  pT  of  each  reaction  mixture  to  the  sample  wells  of  a 
droplet  generator  DG8  cartridge  (Bio-Rad)  and  70  [zL  ddPCR  Droplet 
Generation  Oil  (Bio-Rad)  to  the  oil  wells  of  the  cartridge  for  use  in  the 
QX 100  Droplet  Generator  (Bio-Rad).  Forty  microliters  of  the  generated 
droplet  emulsions  were  transferred  to  Twin.tec  semi-skirted  96-well  PCR 
plates  (Eppendorf,  Hamburg,  Germany),  which  were  then  heat  sealed  with 
pierceable  foil  sheets.  To  amplify  the  target  DNA,  the  droplet  emulsions 
were  thermally  cycled  using  the  following  protocol:  initial  denaturation  at 
95°C  for  10  min,  followed  by  40  cycles  of  94°C  for  30  s  and  60°C  for  1  min. 
The  fluorescence  of  each  thermally  cycled  droplet  was  measured  using  the 
QX  100  Droplet  Reader.  All  measurements  were  performed  in  triplicate. 

Data  analysis 

The  equation  of  the  line  fitting  the  correlation  between  amplicon  size  and 
fluorescence  amplitude  for  the  size  standards  was  generated  using  Microsoft 
Excel  (Redmond,  WA)  and  applied  to  the  measured  fluorescence  amplitude 
of  each  sequencing  library  to  calculate  amplicon  size.  The  .fastq  data  files 
produced  by  the  MiSeq  were  imported  to  Sequencher  (Gene  Codes,  Ann 
Arbor,  MI)  and  aligned  to  the  pET-23a  plasmid  sequence  to  generate  a 
sequence  alignment/map  file  (SAM).  A  perl  script  was  used  to  count  the 
length  of  each  read  pair  by  retrieving  the  number  corresponding  to  the 
“TEEN”  field  of  the  SAM  file.  Only  library  molecules  for  which  both 
paired-end  reads  passed  the  quality  filter  were  included  in  the  analysis. 


Results  and  discussion 

The  ability  of  the  QuantiSize  assay  to  combine  quantification  and  size 
determination  in  asingle  ddPCRexperiment  is  derived  from  a  correlation 
that  exists  between  the  fluorescence  amplitude  of  droplets  and  the  size  of 
amplicons  within  them.  With  standard  ddPCR  reagent  concentrations, 
DNA  amplification  is  eventually  limited  by  the  availability  of  dNTPs 
and  inhibited  by  the  presence  of pyrophosphate  (19,20);  thus  longDNA 
templates,  which  consume  more  dNTPs  and  generate  more  pyrophos¬ 
phate,  will  produce  fewer  products  than  short  templates  at  the  end  point 
of  a  standard  reaction.  Because  the  final  number  of  products  generated 
within  a  droplet  determines  its  level  of  fluorescence,  the  measured  fluores¬ 
cence  amplitude  of  droplets  containing  short  templates  will  be  greater 
than  that  of  droplets  containing  long  templates.  The  QuantiSize  assay 
exploits  this  fact  to  generate  an  equation  relating  fluorescence  amplitude 
to  amplicon  size  by  using  measurements  of  known  size  standards.  The 
equation  describing  the  relationship  between  fluorescence  amplitude  and 
amplicon  size  can  be  used  to  calculate  the  size  of  any  unknown  ddPCR 
template  that  shares  common  primer  and  probe  binding  sites  with  the  size 
standards.  Creating  size  standards  that  have  primer  and  probe  binding 
sites  in  common  with  DNA  samples  can  be  accomplished  in  a  number 
of  ways  including  cloning  sample  DNA  into  a  vector  and  appending 
adapter  sequences  to  both  the  sample  DNA  and  size  standards  (21). 

We  created  a  set  of  size  standards  applicable  toIlluminaNGS  libraries 
containing  inserts  ranging  from  25  to  1000  base  pairs  flanked  by  adapter 
sequences  compatible  with  the  Illumina  MiSeq  platform.  A  pair  of 
primers  and  a  fluorescent  TaqMan  probe  were  designed  to  hybridize 
to  the  adapter  sequences  such  that  the  length  of  each  amplicon  is  160 
bp  plus  the  length  of  the  insert.  As  primers  and  probe  are  specific  to  the 
MiSeq  adapter  sequences,  only  adapter-ligated  molecules  that  will  be 
amplifiable  on  the  MiSeq  flow  cell  will  be  quantified. 

A  ddPCR  experiment  was  performed  with  the  aforementioned  size 
standards  in  separate  wells  of  a  96-well  plate.  Droplets  containing  the  target 
(positive)  increased  in  fluorescence  following  amplification  of  the  target 
whereas  droplets  lacking  the  target  (negative)  remained  at  the  background 
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Figure  3.  Cluster  density  and  number  of  sequencing  reads  ±  SEM  across  multiple 
sequencing  runs  performed  using  QuantiSize.  (A)  Eight  uniquely  indexed  libraries 
were  loaded  onto  the  MiSeq  with  two  libraries  at  each  concentration.  The 
libraries  were  loaded  in  a  concentration  ratio  of  100:50:10:1,  based  on  ddPCR 
measurements.  Due  to  the  binding  kinetics  of  library  molecules  on  the  MiSeq  flow 
cell,  the  number  of  reads  generated  by  the  MiSeq  is  expected  to  be  a  fraction  of  the 
number  of  library  molecules  loaded.  The  relative  numbers  of  MiSeq  reads  for  each 
library  closely  correspond  to  the  relative  numbers  of  molecules  loaded  according  to 
ddPCR  measurements  (R2  =  0.9693).  (B)  Mean  cluster  density  (±SEM)  resulting 
from  three  separate  sequencing  runs  using  Nextera-prepared  samples  normalized 
based  on  QuantiSize  measurements.  The  target  cluster  density  (represented  by  a 
horizontal  dashed  line)  was  1.028xl06  clusters/mm2  including  a  5%  phi  X  control 
and  the  observed  mean  cluster  density  from  the  3  runs  was  1.039  ±  0.053xl06 
cluster/mm2.  (C)  Mean  number  of  reads  (±SEM)  resulting  from  three  separate 
sequencing  runs.  The  observed  mean  number  of  reads  was  1.813  ±  0.070xl07. 

level  of  fluorescence  (Figure  1  A).  The  distribution  of  droplet  amplitudes  is 
consistent  across  most  amplicon  lengths,  but  the 760  and 860 bp  amplicons 
show  a  broader  distribution  of  amplitudes  (Figure  IB).  An  inverse,  linear 
correlation  between  amplicon  size  and  mean  fluorescence  amplitude 
was  observed  (R2  =  0.99436)  (Figure  1C).  The  equation  describing  this 
correlation  allows  for  the  calculation  of  amplicon  size  given  a  measured 
fluorescence  amplitude.  The  slope  of  this  equation  provides  a  measure 
of  the  difference  in  mean  fluorescence  amplitude  that  is  expected  with  a 
given  difference  in  amplicon  size.  Maximizing  the  magnitude  of  this  slope 
maximizes  the  resolution  of  size  standards,  which  is  advantageous  for  the 
purpose  ofdetermining  the  length  ofunknown  amplicons  more  accurately. 
The  size  standards  used  for  QuantiSize  are  highly  analogous  to  the  standards 
used  in  gel  and  capillary  electrophoresis.  The  size  of  unknown  DNA  can 
be  determined  by  visually  comparing  the  fluorescence  amplitude  of  the 
size  references  to  that  of  the  unknown  DNA  or  by  entering  the  fluores¬ 
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cence  amplitude  value  into  the  equation  describing  the  relationship  between 
average  fluorescence  amplitude  and  amplicon  size  for  the  size  standards. 

The  droplet  reader  software  counts  positive  and  negative  droplets 
by  using  a  threshold  of  fluorescence  between  the  well-defined  popula¬ 
tions  of  high  and  low  fluorescence  amplitude  droplets.  One  particular 
TaqMan  probe  displayed  a  fluorescence  amplitude  for  droplets  containing 
amplicons  larger  than  660  bp  that  is  too  low  to  reliably  discriminate 
between  positive  and  negative  droplets  when  templates  are  amplified 
with  a  1  min  elongation  time.  When  this  is  the  case,  the  average  fluores¬ 
cence  amplitude  for  these  amplicons  cannot  be  calculated.  Increasing  the 
elongation  time  to  two  minutes  increases  the  fluorescence  amplitude  of 
all  droplets  containing  amplifiable  template  (Figure  2).  This  enables  the 
acquisition  of  accurate  concentration  and  fluorescence  amplitude  data  for 
longer  templates,  but  the  slope  of  the  relationship  between  amplicon  size 
and  fluorescence  amplitude  is  decreased  (from  m  =  -1 1 .66  to  m  =  -9.1 2), 
which  decreases  the  ability  to  resolve  small  differences  in  amplicon  size 
(Figure  2).  Decreasing  the  elongation  time  to  30  s  increases  the  resolution 
of  the  relationship  between  amplicon  size  and  fluorescence  amplitude, 
but  prevents  targets  longer  than  460  bp  from  amplifying  to  the  point 
that  they  fluoresce  detectably  above  the  background  fluorescence  (Figure 
2).  This  is  likely  due  to  the  fact  that  longer  products  require  more  time 
for  complete  polymerization  of  nascent  strands  to  occur.  Thus,  there  is 
a  tradeoff  between  the  resolution  and  range  of  QuantiSize,  though  the 
assay  can  be  easily  adjusted  to  fit  particular  experimental  needs. 

To  validate  the  use  of  the  QuantiSize  assay  for  the  sizing  and  quantifi¬ 
cation  steps  in  library  preparation  for  NGS,  we  compared  the  quantity  and 
size  distribution  of  library  DNA  predicted  by  the  QuantiSize  assay  to  the 
quantity  and  size  distribution  of  reads  generated  by  the  Illumina  MiSeq 
platform.  Eight  uniquely  indexed  test  libraries  were  generated  by  ligating 
DNAsheared  to  an  average  size  of  150  bp  onto  MiSeq  compatible  adapter 
sequences  similar  to  those  used  to  create  the  aforementioned  size  standards. 
The  libraries  were  runin  individual  wells  ofaddPCRexperiment  alongside 
the  set  of  size  standards.  Using  the  concentrations  measured  by  ddPCR, 
the  8  uniquely  indexed  libraries  were  diluted  and  combined  in  a  molar 
ratio  of 100:50:10: 1  with  2  libraries  at  each  concentration.  The  observed 
number  of  MiSeq  reads  containing  each  index  was  compared  with  the 
expected  number  of  copies  of  each  uniquely  indexed  library  loaded  onto 
the  MiSeq.  The  observed  ratio  of  the  number  of  reads  containing  each 
index  very  closely  matched  the  expected  ratio  of  100:50:10:1  that  was 
measured  by  ddPCR  and  the  correlation  between  the  expected  and  actual 
number  of  library  molecules  gave  an  R2  value  of 0.9693  (Figure  3A). 

The  Nextera  XT  DNA  Sample  Preparation  Kit  was  used  to  prepare  a 
sequencing  library  from  genomic  DNA  extracted  from  the  human  colon 
cancer  cell  line  HCT 1 16.  In  lieu  of  the  optional  bead  normalization  step 
in  the  Nextera  XT  protocol,  the  concentration  and  size  distribution  of 
the  library  were  measured  with  QuantiSize  and  the  library  concentration 
was  adjusted  to  the  proper  concentration  based  on  this  measurement. 
The  process  of  quantifying,  normalizing,  denaturing,  and  loading  the 
library  onto  the  MiSeq  was  repeated  three  times  to  demonstrate  the 
precision  of  QuantiSize  for  predicting  cluster  density.  The  target  cluster 
density  for  each  MiSeq  run  was  l.OxlO6  clusters/mm2  (l.028xl06  clusters/ 
mm2  including  the  phi  X  control  DNA).  The  mean  cluster  density  ± 
SEM  obtained  was  1.039  ±  0.053xl06  clusters/mm2  (Figure  3B).  The 
mean  number  of  reads  ±  SEM  obtained  was  1.813  ±  0.070xl07  (Figure 
3C).  There  are  several  potential  sources  of  error  in  the  MiSeq  sample 
loading  process  including  the  pipetting  of  small  volumes,  variability 
in  the  efficiency  of  the  denaturation  reaction,  variability  of  flow  cell 
surface  area,  and  user  error.  These  factors  may  account  for  some  of  the 
observed  variance  in  cluster  density.  However,  even  with  the  potential 
error  caused  by  these  factors,  the  target  cluster  density  was  achieved  with 
high  precision  using  the  QuantiSize  method. 

The  equation  relating  amplicon  size  and  fluorescence  amplitude  can  be 
applied  either  to  the  average  (mean  or  median)  fluorescence  amplitude  of 
a  sample  or  to  the  fluorescence  amplitude  of  individual  droplets.  Applying 
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Figure  4.  Comparison  of  library  molecule  size  distribution.  The  Quan 
tiSize  assay  was  performed  on  a  DNA  library  prepared  for  the  MiSeq  in 
order  to  predict  the  distribution  of  library  molecule  sizes.  The  DNA  library 
was  amplified  in  parallel  with  a  set  of  size  standards  using  the  same  prim¬ 
ers  and  TaqMan  probe,  allowing  us  to  estimate  the  expected  amplicon 
size  within  each  individual  droplet.  The  resulting  size  distribution  is  shown 
in  blue.  The  actual  size  distribution  was  determined  through  paired-end 
sequencing  on  the  lllumina  MiSeq  system  (shown  in  red).  Both  histograms 
show  the  relative  frequency  of  measured  molecule  sizes  in  ten  base  pair 
bins.  The  size  distribution  measured  by  QuantiSize  is  naturally  wider  than 
the  distribution  measured  by  the  MiSeq  due  to  the  inherent  variance  in 
droplet  amplitude  that  occurs  even  with  amplicons  of  the  same  length. 
The  DNA  library  was  amplified  in  parallel  with  a  set  of  size  standards 
using  the  same  primers  and  TaqMan  probe,  allowing  us  to  estimate  the 
expected  amplicon  size  within  each  individual  droplet. 

the  equation  to  individual  droplets  allows  for  a  more  detailed  analysis 
of  the  distribution  of  product  sizes  present  in  a  sample.  We  applied  the 
equation  generated  by  the  adapter-ligated  size  standards  to  the  fluores¬ 
cence  amplitude  of  individual  droplets  containing  library  DNA  to 
calculate  the  expected  amplicon  size  within  each  droplet  and  compared 
the  distribution  of  sizes  to  the  distribution  of  read  sizes  measured  by  the 
MiSeq  (Figure  4).  The  frequency  distribution  shows  a  high  degree  of 
overlap  and  a  common  center  point  for  the  estimation  made  using  the 
QuantiSize  assay  and  the  observations  from  the  MiSeq.  As  depicted  in 
Figure  2A,  there  is  an  inherent  variance  in  droplet  amplitude  that  occurs 
even  within  a  completely  homogeneous  sample  of  amplicon  lengths. 
This  variance  likely  accounts  for  the  wider  distribution  of  product  sizes 
estimated  by  ddPCR  than  were  observed  in  the  MiSeq  data.  Size  deter¬ 
mination  with  QuantiSize  provides  a  detailed  calculation  of  the  distri¬ 
bution  of  fragment  sizes  present  in  a  sample,  whereas  gel  or  capillary 
electrophoresis  provide  only  a  range  of  sizes. 

The  QuantiSize  assay  demonstrates  accuracy,  reliability,  and  flexibility 
through  the  strength  of  the  correlation  between  fluorescence  amplitude 
and  amplicon  size  in  ddPCR  experiments,  and  the  ease  with  which  the 
assay  can  be  adjusted  to  fit  specific  experimental  needs.  Applying  the 
QuantiSize  assay  to  NGS  library  preparation  avoids  the  limitations  of 
other  independent  quantification  and  size  determination  methods  and 
has  the  potential  to  increase  the  average  yield  of  usable  data  generated  from 
sequencing  runs,  thereby  increasing  the  efficiency  and  throughput.  The 
ability  to  determine  the  absolute  quantity  and  the  detailed  size  distribution 
of  target  DNA  with  a  single  experiment  will  be  useful  for  a  broad  range 
of  applications  that  require  the  quantification  and  sizing  of  target  DNA. 
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Summary 

Due  largely  to  the  inability  to  accurately  quantify  and  characterize 
de  novo  deletion  events,  the  mechanisms  underpinning  the 
pathogenic  expansion  of  mtDNA  deletions  in  aging  and  neuro¬ 
muscular  disorders  remain  poorly  understood.  Here,  we  outline 
and  validate  a  new  tool  termed  'Digital  Deletion  Detection'  (3D) 
that  allows  for  high-resolution  analysis  of  rare  deletions  occurring 
at  frequencies  as  low  as  1  x  10  s.  3D  is  a  three-step  process  that 
includes  targeted  enrichment  for  deletion-bearing  molecules, 
single-molecule  partitioning  of  genomes  into  thousands  of  drop¬ 
lets  for  direct  quantification  via  droplet  digital  PCR,  and  breakpoint 
characterization  using  massively  parallel  sequencing.  Using  3D,  we 
interrogated  over  8  billion  mitochondrial  genomes  to  analyze  the 
age-related  dynamics  of  mtDNA  deletions  in  human  brain  tissue. 
We  demonstrate  that  the  total  deletion  load  increases  with  age, 
while  the  total  number  and  diversity  of  unique  deletions  remain 
constant.  Our  data  provide  support  for  the  hypothesis  that 
expansion  of  pre-existing  mutations  is  the  primary  factor  contrib¬ 
uting  to  age-related  accumulation  of  mtDNA  deletions. 

Key  words:  aging;  genome  instability;  mitochondrial  dis¬ 
ease;  mitochondrial  DNA;  next-generation  sequencing;  rare 
■  deletion  detection 


Introduction 

The  human  mitochondrial  genome  is  a  small  (16.5  kb)  circular  DNA 
molecule  that  is  present  in  multiple  copies  per  cell  (between  1000  and 
10  000  copies  depending  on  the  cell  type)  (Berdanier  &  Everts,  2001). 
This  small  genome  is  densely  packed  with  13  structural  genes  that 
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encode  the  major  catalytic  components  of  the  core  complexes  involved 
in  oxidative  phosphorylation  (OXPHOS),  as  well  as  22  tRNAs  and  2  rRNAs 
that  are  essential  for  mitochondrial  protein  synthesis  (Scheffler,  2008). 
Because  of  the  density  of  the  gene  structure,  deletions  in  mitochondrial 
DNA  (mtDNA)  tend  to  affect  multiple  genes,  including  several  essential 
tRNAs. 

Accumulated  mitochondrial  deletions  are  known  to  cause  a  number 
of  neuromuscular  disorders,  including  Kearns-Sayre  syndrome,  progres¬ 
sive  external  ophthalmoplegia,  and  Pearson  syndrome  (Chinnery,  1993; 
Berdanier  &  Everts,  2001;  Greaves  efa/.,  2012).  These  diseases  are 
typically  (but  not  exclusively)  associated  with  a  4977-bp  'common' 
deletion  between  np  8482  and  np  13  460.  Additionally,  an  increasing 
number  of  associations  are  being  discovered  between  mtDNA  and 
cancer.  Cancer-associated  deletions  tend  to  be  smaller  (<  1  kb)  than 
those  associated  with  neuromuscular  disorders  (Lee  efa/.,  2010). 
Whereas  accumulation  of  large  deletions  leads  to  mitochondrial 
dysfunction  and  apoptosis,  it  is  thought  that  small  deletions  may  confer 
milder  phenotypes  that  can  promote  tumor  cell  proliferation,  drug 
resistance,  and  malignancy.  Finally,  accumulation  of  mtDNA  deletions  in 
postmitotic  tissue  (e.g.,  brain,  heart,  and  skeletal  muscle)  is  thought  to 
be  an  important  driving  force  in  both  physiological  and  accelerated 
aging  (Cortopassi  &  Arnheim,  1990;  Meissner  efa/.,  2008;  Vermulst 
ef  a/.,  2008b;  Khrapko  &  Vijg,  2009). 

In  neuromuscular  disorders,  cancer  and  aging,  the  pathological 
mtDNA  deletions  appear  to  be  somatically  acquired  (Meissner,  2007; 
Meissner  efa/.,  2008).  Furthermore,  individual  mitochondrial  mutations 
must  expand  above  a  threshold  intracellular  frequency,  typically  60-90% 
of  a  cell's  mtDNA,  before  it  reaches  phenotypic  expression  (Vermulst 
efa/.,  2012).  Thus,  the  etiology  of  mitochondrial  deletion  diseases 
necessarily  involves  two  distinct  processes:  the  somatic  generation  of  the 
deletion  (s)  and  their  subsequent  expansion  to  phenotypic  levels. 
However,  neither  of  these  processes  is  well  understood  (Krishnan  ef  a/., 
2008;  de  Grey,  2009;  Song  ef  a/.,  201 1).  One  of  the  key  difficulties  is  a 
lack  of  sensitive  assays  to  detect  de  novo  deletions,  which  in  normal 
tissue  may  be  lower  than  1  deletion  per  million  genomes,  and  track  the 
kinetics  of  their  initial  selection.  Current  assays  lack  the  sensitivity  to 
capture  these  rare  events  without  first  amplifying  the  target  sites, 
typically  via  PCR  (Cortopassi  &  Arnheim,  1990;  He  efa/.,  2002;  Chabi 
efa/.,  2003;  Kraytsberg  efa/.,  2008).  This  practice  is  subject  to 
introduction  of  numerous  artifacts,  is  biased  toward  amplification  of 
large  products,  and  often  only  allows  detection  of  a  subset  of  deletions 
that  have  already  undergone  some  level  of  expansion.  The  increasingly 
large  body  of  work  devoted  to  elucidating  the  mechanisms  by  which 
somatically  acquired  deletions  undergo  intra-  and/or  intercellular  expan¬ 
sion  serves  to  underscore  the  need  for  more  sensitive  tools  to  study  this 
important  phenomenon  (Cortopassi  &  Arnheim,  1990;  Coller  efa/., 
2001;  Foury  efa/.,  2004;  Durham  efa/.,  2006;  Krishnan  efa/.,  2008; 
Fukui  &  Moraes,  2009;  Kato  ef  a/.,  201 1 ;  Payne  ef  a/.,  201 1 ;  Song  ef  a/., 
201 1;  Freyer  ef  a/.,  2012;  Vermulst  ef  a/.,  2012). 

To  more  sensitively  characterize  the  formation  and  expansion  of 
mitochondrial  deletions,  we  have  developed  a  new  procedure  for 
quantitative  analysis  of  rare  deletion  events.  This  assay,  termed  'Digital 
Deletion  Detection'  (3D),  allows  us  to  directly  quantify  and  characterize 
site-specific  rare  mitochondrial  deletions  that  occur  at  frequencies  as  low 


PQ 


XO 


W 


u 


©  2013  John  Wiley  &  Sons  Ltd  and  The  Anatomical  Society 

This  is  an  open  access  article  under  the  terms  of  the  Creative  Commons  Attribution-NonCommercial-NoDerivs  License,  which  permits  use  and 
distribution  in  any  medium,  provided  the  original  work  is  properly  cited,  the  use  is  non-commercial  and  no  modifications  or  adaptations  are  made. 


1 


COLOR 


2  Digital  deletion  detection,  S.  D.  Taylor  ef  al. 

as  1  deletion  per  100  million  genomes.  We  demonstrate  that  3D  is 
accurate  over  a  broad  dynamic  range  and  is  capable  of  detecting  both 
specific  and  random  deletion  events  within  a  targeted  region  of  the 
mitochondrial  genome.  We  have  successfully  used  3D  to  study 
accumulation  of  clonal  and  random  mitochondrial  deletions  in  human 
brain  tissue  with  respect  to  age,  allowing  a  high-resolution  analysis  of 
deletion  dynamics  in  aging  tissue. 

Results 

Assay  design 

Digital  Deletion  Detection  (3D)  is  an  extremely  sensitive  tool  for  the 
absolute  quantification  and  characterization  of  rare  deletion  molecules. 
The  basic  strategy  behind  3D  is  a  three-step  process:  enrich,  amplify,  and 
analyze.  The  first  step,  based  on  methods  developed  previously  by  Bielas 
and  colleagues,  enriches  for  deletion-bearing  molecules  and  improves 
mutant  specificity  (Bielas  &  Loeb,  2005;  Vermulst  ef  al.,  2008a).  This  step 
consists  of  targeted  endonucleolytic  digestion  of  templates  to  selectively 
digest  wild-type  (WT)  molecules,  thus  allowing  the  preferential  PCR 
amplification  of  molecules  bearing  an  appropriate  deletion  (Fig.  1A). 
After  digestion,  the  DNA  molecules  are  sequestered  into  homogenous 
1  nL  water-in-oil  emulsion  droplets  and  subjected  to  normal  PCR 
amplification  (Fig.  IB).  The  concentration  of  molecules  within  the 
droplets  is  adjusted  such  that  most  droplets  contain  no  mutant  genomes, 
while  a  small  fraction  contains  only  one.  Thus,  a  single  well  in  the 


reaction  actually  consists  of  many  thousand  single-molecule  reaction 
chambers.  This  process  allows  each  captured  deletion  to  be  amplified 
without  bias  and  without  introducing  many  of  the  PCR  artifacts  that  are 
common  to  bulk  amplification  reactions  (i.e.,  template  switching  and 
preferential  amplification  of  short  templates). 

Following  amplification,  the  deletions  can  be  analyzed  via  two  process 
pathways.  In  the  quantification  pathway,  high-resolution  quantification 
of  deletions  is  accomplished  through  the  use  of  droplet  digital  PCR 
(ddPCR)  (Pinheiro  ef  al.,  2012).  With  the  inclusion  of  TaqMan  reporter 
chemistry,  droplets  bearing  amplified  templates  are  readily  distinguished 
by  their  fluorescence  amplitude  using  a  cytometry  system.  Because  the 
droplet  volumes  are  highly  uniform,  Poisson  statistics  can  be  applied  to 
calculate  the  average  number  of  deletion-bearing  molecules  per  droplet 
and  the  absolute  concentration  of  mutant  molecules  determined  with 
high  precision  and  accuracy  (Pinheiro  ef  al.,  2012).  Alternatively,  in  the 
characterization  pathway,  droplets  are  disrupted  and  amplicons  recov¬ 
ered.  The  deletions  can  then  be  directly  sequenced  using  high- 
throughput  or  'next-generation'  sequencing  or  cloned  for  use  in  Sanger 
sequencing  or  other  downstream  applications. 

Sensitivity  and  recovery 

Using  the  quantification  process  pathway  of  3D  (Fig.  1),  we  measured 
the  absolute  deletion  frequency  within  a  region  spanning  the  ND1/ND2 
genes  in  mitochondrial  DNA  isolated  from  human  epithelial  cells  in  tissue 
culture.  We  measured  the  deletion  frequency  to  be  1 .6  ±  0.4  deletions 
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Fig.  1  Overview  of  Digital  Deletion  Detection  (3D).  (A)  Enrichment  of  deletion-bearing  molecules.  WT  molecules  harbor  endonuclease  recognition  sites  within  the 
target  region.  Upon  digestion,  the  target  is  cleaved,  making  the  WT  molecule  unsuitable  as  a  template  for  PCR  amplification.  In  contrast,  mutant  molecules  that 
harbor  deletions  and  remove  the  restriction  recognition  sites  are  resistant  to  digestion.  These  molecules  serve  as  templates  for  PCR  amplification.  The  presence  of  the 
TaqMan®  hydrolysis  probe  allows  for  the  detection  and  enumeration  of  each  molecule  in  the  sample  bearing  the  appropriate  deletion.  (B)  Mutant  target  molecules 
(depicted  as  an  idealized,  unbroken  circular  mitochondrial  chromosome)  are  individually  sequestered  into  1  nL  water-in-oil  droplets  along  with  TaqMan®  PCR  chemistry 
and  target-specific  TaqMan®  probes.  Droplets  are  thermally  cycled.  Because  the  average  number  of  molecules  per  droplet  is  less  than  one,  positive  droplets  (green  droplets) 
represent  individual  reaction  vessels  for  single-molecule  quantitative  PCR  amplification.  Droplets  are  individually  scanned  and  scored  as  positive  or  negative,  thus  providing 
a  digital  quantification  of  all  deletion-bearing  molecules  within  the  sample.  Alternatively,  droplets  can  be  disrupted  and  the  amplification  products  subjected  to  physical 
characterization,  for  example  cloning,  sequencing,  or  other  applications. 
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per  ten  million  genomes  (or  1.6  x  1CT  per  genome)  (Fig.  2).  We  next 
asked  whether  3D  was  able  to  fully  recover  all  of  the  deletions  within  a 
sample  over  a  broad  range  of  deletion  frequencies.  To  address  this,  we 
performed  a  series  of  reconstruction  experiments.  First,  a  plasmid 
harboring  a  fragment  of  mtDNA  containing  a  known  deletion  in  the 
ND1/ND2  region  was  mixed  at  a  constant  concentration  (3  copies  pL-1) 
against  increasingly  higher  levels  of  genomic  mtDNA  (up  to  2.5  x  106 
copies  pl±').  We  then  performed  3D  analysis  to  determine  whether  the 
small  concentration  of  the  control  molecules  could  be  accurately 
quantified  in  the  presence  of  increasing  concentrations  of  background 
DNA  (Fig.  2).  This  reconstruction  demonstrated  accurate  quantification 
of  target  molecules  across  a  range  of  frequencies  spanning  eight  orders 
of  magnitude,  with  sensitive  recovery  at  frequencies  as  low  as  1  x  1CT7 
per  genome. 

Because  we  reached  the  endogenous  deletion  frequency  of  the 
background  DNA,  we  were  unable  to  test  lower  frequencies  in  the 
reconstruction  experiment.  To  determine  whether  we  could  detect  even 
rarer  events,  we  applied  3D  to  mtDNA  isolated  from  muscle  samples  of 
mice,  choosing  a  site  encompassing  the  light  chain  origin  of  replication 
(Supplementary  Note  4).  Because  deletion  of  this  site  would  severely 
impede  the  ability  of  the  genome  to  replicate,  we  expected  the  deletion 
frequency  at  this  site  to  be  extremely  low.  3D  analysis  revealed  a  deletion 
frequency  of  1.3  ±  0.4  x  1CT8  per  genome  (Fig.  S3). 

Capturing  and  analyzing  sample  complexity 

Next  we  characterized  the  ability  of  3D  to  perform  accurate  quantifi¬ 
cation  of  the  deletion  frequency  when  applied  to  a  heterogeneous 
population  of  deletions.  To  this  end,  we  obtained  three  control  plasmids, 
each  containing  an  mtDNA  fragment  harboring  a  unique  deletion  from 
the  minor  arc  of  the  human  mitochondrial  genome  (3534A997, 
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Fig.  2  Sensitivity  and  recovery.  3D  was  performed  on  Tag/-digested  HCT  1 16 
mtDNA  using  primers  and  probes  for  the  human  ND1/ND2  site  to  give  the 
endogenous  deletion  frequency  (empty  circles).  Reconstruction  experiments  were 
performed  by  spiking  in  3  molecules  pL_1  of  a  control  plasmid  bearing  a  portion  of 
the  human  mitochondrial  genome  with  a  known  deletion  (3534A997)  into  a  serial 
dilution  series  of  Tag/-digested  FICT  116  mtDNA  (filled  circles).  The  predicted 
deletion  frequency  is  plotted  against  the  measured  deletion  frequency.  Each  data 
point  represents  an  individual  experiment.  The  reconstruction  data  were  fit  to  y  =  x 
(dotted  line)  with  a  correlation  coefficient  R2  =  0.9942. 


3719A809,  and  3871 A492).  We  subjected  equal  amounts  (300  mole¬ 
cules  plw1)  of  each  control  plasmid  to  3D  analysis,  either  separately  or 
combined  into  a  single  reaction,  to  determine  whether  3D  could 
accurately  report  the  known  concentration  of  a  mixture  of  target 
molecules  (Fig.  3A).  3D  quantification  of  the  individual  plasmids  yielded 
concentrations  of  313  ±  6,  304  ±  6,  and  322  ±  6  molecules  pL~\ 
respectively  (Fig.  3B).  Quantification  of  the  combined  reaction  yielded  a 
concentration  of  915  ±  12  molecules  plw'.  These  values  match  the 
expected  concentrations  within  the  limits  of  uncertainty  due  to  the 
stochastic  effect  associated  with  sampling  of  a  dilute  solution  (Pinheiro 
et  al.,  2012). 

Analysis  of  fluorescence  amplitudes  of  the  three  control  plasmids 
following  ddPCR  revealed  that  under  the  current  conditions,  a  given 
template  will  yield  an  average  droplet  fluorescence  intensity  inversely 
proportional  to  the  template  size  (Fig.  3C).  When  the  three  control 
templates  were  combined,  this  effect  led  to  a  striking  multimodal 
distribution  in  the  fluorescence  amplitudes  (Fig.  3A).  More  generally,  we 
found  that  the  sample  heterogeneity  is  reflected  in  the  distribution  of 
fluorescence  amplitudes  (Fig.  3C).  Thus,  the  average  amplitude  and 
distribution  of  the  droplet  fluorescence  can  be  used  to  predict  deletion 
sizes  and  complexity  (e.g.,  presence  of  a  single,  clonal  deletion  vs.  a 
heterogeneous  population  of  multiple  deletions). 

Deletion  dynamics  in  aging  postmitotic  tissue 

While  it  is  known  that  mtDNA  deletions  accumulate  to  relatively  high 
levels  in  aged,  postmitotic  tissue  in  humans  (Cortopassi  &  Arnheim, 
1 990),  very  little  is  known  about  the  underlying  dynamics.  Specifically,  as 
a  tissue  ages  and  accumulates  deletions,  it  is  unknown  whether  this 
increased  deletion  load  arises  through  clonal  expansion  of  an  existing 
pool  of  mtDNA  deletions  (early  acquisition),  continual  accumulation  of 
new  mutations  (late  acquisition),  or  an  equilibrium  of  both  processes 
(Khrapko,  2011).  With  3D,  we  can  now  begin  to  directly  assess  these 
longitudinal  changes.  We  used  3D  to  characterize  deletions  with  respect 
to  age  at  two  regions  of  the  mitochondrial  genome  from  a  collection  of 
human  brain  tissue  (Fig.  4).  Using  the  quantification  process  pathway  of 
3D,  we  found  that  the  total  deletion  frequency  increases  with  age  at 
both  sites  (Figs  5A  and  S4).  The  common  deletion  was  found  to  gain  in 
frequency  from  1.91  ±0.15  x  1CT6  per  genome  at  age  15  years  to 
levels  as  high  as  6.36  ±  0.20  x  1CT4  per  genome  by  age  80  years,  an 
increase  of  over  300-fold  (Table  1).  These  levels  and  accumulation  rates 
are  in  agreement  with  previously  published  results  (Meissner  et  al., 
2008).  At  the  ND1/ND2  site,  the  absolute  levels  of  accumulation  also 
increased,  but  were  generally  lower  than  at  the  common  deletion  site. 
Deletion  frequencies  ranged  from  1.9  ±  0.5  x  1CT7  per  genome  to 
5.25  ±  0.22  x  1CT6  per  genome,  about  a  25-fold  increase  over  the 
same  age  span  (Table  1).  Interestingly,  the  increase  in  deletion  frequency 
at  the  ND1/ND2  site  showed  a  stronger  correlation  with  age  than  the 
common  deletion  site  ( R 2  =  0.812  vs.  0.453,  respectively)  (Fig.  5A). 

To  determine  whether  the  increases  in  deletion  frequency  at  these 
sites  were  due  to  expansion  of  existing  deletions  or  acquisition  and 
accumulation  of  new  deletions,  we  sought  to  measure  the  ratio  of 
unique  to  total  deletions  as  a  function  of  age.  To  accomplish  this, 
emulsion  droplets  for  a  subset  of  patients  (n  =  11)  were  disrupted  and 
the  enriched  mutant  fragments  recovered.  We  then  performed  high- 
throughput  massively  parallel  sequencing  analysis  on  each  collection  of 
amplified  targets.  In  this  way,  we  were  able  to  directly  profile  the  entire 
population  of  amplified  deletion  fragments  at  high  resolution.  From 
these  data,  we  were  able  to  determine  the  total  number  of  unique 
deletion  events  present  per  sampled  patient,  which  was  then  normalized 
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Fig.  3  Effects  of  sample  heterogeneity  on  3D  analysis.  (A)  Three  plasmid  controls  (3534A997,  3719A809,  and  3871 A492)  were  diluted  to  an  expected  concentration  of 
300  molecules  pL— 1  template  and  subjected  to  3D  analysis,  either  individually  or  combined.  Blue  dots  represent  droplets  whose  amplitudes  are  above  the  threshold 
('positives'),  while  gray  droplets  are  those  whose  amplitudes  are  below  the  threshold  value  ('negatives').  (B)  Measured  deletion  concentration  for  individual  and  combined 
templates.  Error  bars  indicate  the  Poisson  95%  confidence  intervals  for  each  concentration  determination.  (C)  Box  and  whisker  plot  showing  the  distribution  of  positive 
droplets  for  each  template.  When  used  as  a  template  in  PCR,  each  plasmid  yields  different  size  fragments  (185  bp,  372  bp,  and  686  bp,  respectively).  There  is  an  inverse 
relationship  between  average  fluorescence  amplitude  and  template  length,  as  well  as  a  relationship  between  the  sample  complexity  and  the  breadth  of  the  distribution  of 
positive  droplets. 
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Fig.  4  Deletion  sites  for  3D  analysis  of  brain  mitochondrial  DNA.  Probe  and  primer 
sets  were  designed  to  detect  deletions  in  two  regions  of  the  human  mitochondrial 
genome.  The  first  region  is  defined  by  a  primer  set  that  flanks  np  8299-1 3470  and 
is  designed  to  detect  variants  of  the  common  deletion.  The  second  primer  set 
flanks  np  3497-4676,  spanning  the  junction  between  the  ND1  and  ND2  genes  in 
the  minor  arc.  The  common  deletion  primer  set  flanks  ten  Taql  sites,  while  the 
ND1/ND2  primer  set  flanks  four  Taql  sites. 


against  the  total  number  of  deletions  in  the  sample  (Table  1,  Fig.  5B). 
Linear  regression  analysis  showed  no  significant  correlation  between  the 
ratio  of  unique  to  total  deletions  and  age  at  either  site  (P  =  0.120  and 
P  =  0.1 50  for  the  ND1/ND2  and  common  deletion  sites,  respectively).  To 
ensure  that  our  data  are  not  influenced  by  sampling  or  processing 
artifacts,  we  analyzed  a  number  of  parameters,  including  the  total 
number  of  genomes  isolated  and  screened,  the  number  of  droplets  used 
in  ddPCR  analysis,  and  site  saturation  effects  (Data  SI,  Supplementary 
Note  5).  Analysis  of  these  parameters  indicates  that  our  data  are  free 
from  any  such  confounding  effects  that  might  artificially  skew  our  results 
(Figs  S4,  S5,  and  S6). 

We  next  analyzed  how  the  diversity  within  the  pool  of  deletions 
might  change  with  respect  to  age.  Analysis  of  the  amplitude  distribution 
of  positive  droplets  from  ddPCR  predicts  that  there  is  low  heterogeneity 
at  the  common  deletion  site  and  high  degree  of  heterogeneity  at  the 
ND1/ND2  site  (Fig.  S7).  However,  at  both  sites,  the  diversity  does  not 
appear  to  change  with  age.  These  findings  were  confirmed  through 
breakpoint  analysis  of  the  sequenced  deletions.  Each  unique  deletion 
was  individually  analyzed  and  characterized  by  deletion  length  and 
relative  frequency  in  the  deletion  pool  (Fig.  6,  Data  S2).  At  the  common 
deletion  site,  we  observed  a  single  dominant  deletion  in  every  case, 


which  contributed  to  over  90%  of  the  deletion  load  (Figs  6  and  S8). 
Although  several  minor  variants  are  present  in  each  patient,  most 
generally  contributed  <  0.5%  of  the  total  deletion  burden.  At  the  ND1/ 
ND2  site,  however,  there  Is  a  broad  but  fairly  uniform  distribution  of 
deletion  sizes  within  the  ND1/ND2  deletion  space  across  individuals  of 
all  ages  (Fig.  6,  bottom  panel).  The  bulk  of  the  deletion  load  was 
typically  comprised  of  deletions,  which  Individually  contributed  between 
1  and  10%  to  the  total  deletion  burden  (Figs  6  and  S8).  The  data 
indicate  no  major  shift  in  the  size  distribution  of  deletions  as  well  as 
the  relative  pools  of  high-  and  low-frequency  deletions  with  age  (Figs  6 
and  S7). 

Finally,  we  examined  the  average  frequency  of  individual  deletions 
with  respect  to  age.  This  was  found  by  taking  the  ratio  of  the  deletion 
frequency  and  the  total  number  of  unique  deletions,  a  value  we  term  the 
expansion  index,  which  is  then  normalized  against  the  youngest  time 
point  for  clarity.  A  decrease  in  the  normalized  expansion  index  with 
respect  to  time  denotes  that  deletions  are  being  selected  against,  while 
an  increase  suggests  positive  selective  pressure.  We  found  that  at  both 
sites,  the  expansion  ratio  increases  significantly  with  age  (Fig.  5C). 
Concomitant  with  a  static  spectrum  of  deletion  diversity  with  age,  we 
conclude  that  expansion  of  a  pre-existing  set  of  deletions  may  be  one  of 
the  primary  drivers  of  age-related  increases  in  deletion  frequency. 


Discussion 

To  adequately  detect  de  novo  mtDNA  deletions  and  trace  the  frequency 
dynamics,  an  assay  is  needed  that  can  enrich  for  and  directly  quantify 
extremely  rare  deletion  events.  Current  approaches  to  analyzing  mtDNA 
deletions  include  Southern  blotting  (DiMauro  &  Hirano,  1993),  direct 
sequencing  (Spelbrink  ef  al.,  2000;  Ameur  ef  al.,  2011;  Kato  ef  al., 
2011;  Sequeira  ef  al.,  2012),  and  PCR  amplification  (Kraytsberg  ef  al., 
2008).  Sequencing  of  deletions  via  cloning  is  laborious,  time-consuming, 
prone  to  cloning  artifacts  and  allows  only  the  most  abundant  deletion 
types  to  be  analyzed  (Supplementary  Notes  3  and  4).  Massively  parallel 
or  'next-generation'  sequencing  is  rapidly  becoming  a  preferred  means 
for  high-throughput  screening  of  individual  DNA  molecules.  As  an 
example,  lllumina,  Inc.  (San  Diego,  CA,  USA)  offers  systems  that 
generate  from  17  million  (MiSeq®)  up  to  3  billion  simultaneous 
sequencing  reads  per  run  (HiSeq®)  (Liu  ef  al.,  2012).  However,  given  a 
relatively  short  read  length  of  <  150  bp  and  the  fact  that  the  majority  of 
the  reads  will  be  off-target,  this  remains  insufficient  to  adequately 
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Table  1  Frequencies  of  mitochondrial  deletion  events  in  human  brain.  The  error  of  duplicate  measurements  is  indicated  as  the  standard  error  of  the  mean  (SEM) 


ID 

Age 

Measured  deletion  frequency  (x  10  7) 

Unique  deletions 
(per  1 000  total) 

ND1/ND2 

SEM 

Common 

SEM 

ND1/ND2 

Common 

P01  + 

28 

11.3 

1.1 

141.5 

4.2 

65.2 

5.0 

P021" 

28 

5.2 

0.1 

38.7 

4.2 

224.0 

9.8 

P031" 

43 

20.0 

1.7 

6355.2 

61.0 

126.1 

0.5 

P04t 

30 

3.3 

2.9 

335.8 

314.6 

227.0 

0.8 

pos1- 

38 

11.7 

1.2 

1883.5 

556.8 

115.3 

1.3 

poe1" 

38 

31.1 

0.1 

3687.8 

28.6 

69.8 

0.9 

P071" 

39 

13.0 

3.7 

1276.0 

19.7 

139.7 

0.7 

pos1" 

43 

23.9 

0.1 

1550.5 

5.1 

77.5 

1.0 

P091" 

46 

20.3 

1.2 

2174.5 

16.2 

157.3 

1.9 

P10+ 

54 

35.3 

0.4 

1996.7 

115.5 

84.3 

2.7 

P1it 

64 

46.0 

1.5 

4856.7 

162.3 

67.6 

0.8 

P12 

43 

5.0 

0.1 

96.0 

3.1 

P13 

37 

13.1 

0.2 

1389.2 

14.8 

P14 

19 

7.3 

1.1 

285.8 

13.3 

PI  5 

32 

19.1 

1.7 

470.0 

5.7 

P16 

15 

1.9 

0.5 

19.1 

1.5 

P17 

45 

16.0 

0.2 

3739.3 

154.1 

P18 

26 

8.4 

2.0 

961.2 

42.3 

P19 

80 

52.5 

2.2 

3084.8 

20.0 

P20 

78 

48.0 

6.3 

3596.7 

5.6 

P21 

71 

36.5 

4.1 

5069.5 

130.5 

^Used  in  NGS  analysis. 


resolve  mtDNA  deletions  that  occur  at  frequencies  of  less  than  one  in  a 
million  genomes.  Even  assuming  no  off-target  reads,  the  MiSeq® 
instrument  would  still  only  yield  about  one  deletion  in  ten  runs.  It  is 
therefore  critical  that  a  selection  step  be  performed  to  limit  the  number 
of  off-target  reads  and  to  enrich  for  deletion-bearing  molecules. 

PCR-based  methods,  including  long-distance  PCR  and  real-time 
quantitative  PCR,  are  among  the  most  frequently  employed  methods 
for  both  selection  and  amplification  of  deletions  (Cortopassi  &  Arnheim, 
1990;  He  et  al.,  2002;  Chabi  et  al.,  2003;  Kraytsberg  et  al.,  2008). 
Generally  speaking,  these  assays  distinguish  wild-type  from  deleted 
genomes  through  exploiting  differences  in  amplicon  fragment  lengths 
and  amplification  efficiencies.  Given  that  they  do  not  select  for  deleted 
molecules  prior  to  amplification,  one  of  the  main  drawbacks  is  high 
background  signal  from  contaminating  wild-type  molecules,  thus 
limiting  the  effective  sensitivity.  Furthermore,  these  bulk  PCR  assays 
tend  to  introduce  a  number  of  additional  artifacts  arising  from 
preferential  amplification  of  small  templates  (allelic  preference),  intro¬ 
duction  of  false  deletions  through  template  jumping,  and  other  PCR 
errors  (Kraytsberg  &  Khrapko,  2005).  Real-time  quantitative  PCR  (qPCR) 
can  be  quite  sensitive,  but  its  reliance  on  relative  differences  in  crossing 
thresholds  rather  than  direct  quantification  makes  it  more  suitable  for 
measuring  fold  changes  rather  than  absolute  deletion  frequencies  (He 
etal.,  2002;  Chabi  et  al.,  2003).  Digital  PCR  methods,  including  long 
single-molecule  PCR  (long  smPCR)  (Kraytsberg  &  Khrapko,  2005;  Guo 
etal.,  2010)  and  the  random  mutation  capture  assay  developed  for 
mtDNA  deletions  (deletion  RMC)  (Vermulst  etal.,  2008a, b)  achieve 
direct  quantification  through  the  use  of  single-molecule  partitioning  in 
96-well  plates.  Partitioning  additionally  serves  to  minimize  artifacts  of 
template  jumping  and  allelic  preference  that  are  common  in  bulk  PCRs 
(Kraytsberg  &  Khrapko,  2005).  Despite  these  advantages,  this  approach 
becomes  laborious  and  costly  when  using  the  wells  of  a  multiwell  plate 


as  the  partition  and  yields  only  a  handful  of  the  most  common  deletions 
within  a  sample. 

The  Digital  Deletion  Detection  (3D)  assay  shows  a  marked  improve¬ 
ment  in  specificity,  sensitivity,  and  accuracy  over  other  available 
methods.  This  is  achieved  via  a  three-step  process  of  selection, 
amplification,  and  characterization  (i.e.,  quantification  or  sequencing). 
As  with  deletion  RMC,  high  specificity  for  deletion-bearing  molecules  is 
achieved  through  the  destruction  of  WT  template  molecules  by 
restriction  endonuclease,  thereby  selecting  for  and  enriching  mutant 
molecules  prior  to  amplification.  Following  enrichment,  partitioning  for 
digital  PCR  amplification  is  performed  through  the  generation  of  up  to 
20  000  droplet  partitions,  the  equivalent  of  over  200  96-well  plates, 
within  a  single  reaction  well.  Quantification  is  greatly  facilitated  through 
the  use  of  TaqMan  reporter  probes  and  cytometry,  which  allows  for 
rapid  enumeration  of  all  partitions  that  contain  an  amplifiable  template 
and  direct  quantification  of  all  deletions  within  a  sample. 

One  of  the  unique  advancements  of  the  3D  assay  is  the  wealth  of 
single-molecule  information  that  is  obtained  from  cytometric  analysis  of 
the  droplet  partitions.  In  other  mtDNA  deletion  detection  assays, 
hundreds  of  wells  must  be  screened  to  yield  a  handful  of  successful 
amplifications.  The  corresponding  template  molecules  can  only  be 
characterized  through  the  additional  steps  of  gel  electrophoresis  or 
sequencing.  This  process  will  tend  to  oversample  large  clonal  deletions 
and  thus  may  not  yield  a  true  representation  of  the  biological  diversity  of 
deletions  present  (see  Supplemental  Note  3).  In  contrast,  3D  provides  an 
opportunity  to  robustly  screen  tens  of  thousands  of  droplet  partitions, 
yielding  hundreds  of  positive  reactions  and  allowing  analysis  of  a  more 
complete  set  of  deletions  in  the  sample.  Moreover,  the  demonstrated 
inverse  relationship  between  template  size  and  the  endpoint  fluorescent 
intensity  of  the  droplet  partitions  (Fig.  3C)  can  be  exploited  to  reveal 
information  regarding  the  size  and  homogeneity  of  the  templates  in  the 
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Fig.  5  3D  analysis  of  deletion  frequency  in  aged  human  brain  tissue.  (A)  Total 
deletion  frequency  at  each  site  is  plotted  against  age.  Deletion  frequency  shows  a 
positive  correlation  with  age  at  both  the  common  deletion  site  (Ft2  =  0.453, 

P  =  0.0008)  and  the  ND1/ND2  site  (R2  =  0.812,  P  =  3  x  10~8).  The  linear 
regression  (transformed)  is  shown  on  a  log-scale  plot.  (B)  Deletion  profiling  was 
performed  on  a  subset  of  patients  to  examine  the  diversity  of  deletions  present  in 
the  deletion  pool.  The  number  of  unique  deletions  is  shown  normalized  against  the 
total  number  of  deletions  for  each  patient  in  the  subset,  respectively.  At  the 
common  deletion  site,  patients  showed  a  range  of  0.5-10  unique  per  1000 
deletions.  At  the  ND1/ND2  site,  the  deletion  diversity  was  much  higher,  ranging 
from  65  to  227  per  thousand.  Linear  regression  analysis  showed  no  significant 
correlation  between  the  unique-to-total  deletion  ratio  and  age  at  either  site 
(P  =  0.15  and  P  =  0.12  for  the  common  and  ND1/ND2  deletion  sites,  respectively). 
(C)  The  relative  expansion  index  for  each  patient  in  the  subset  was  found  by  taking 
the  ratio  of  total  deletion  frequency  over  the  number  of  unique  deletions 
normalized  against  the  youngest  time  point.  This  value  gives  an  estimate  of  the 
average  frequency  of  individual  deletions  for  each  patient  relative  to  the  youngest 
time  point  (i.e.,  the  average  individual  deletion  frequency).  Linear  regression 
showed  a  positive  correlation  for  both  the  common  deletion  site  ( R 2  =  0.697, 

P=  0.001)  and  the  ND1/ND2  site  ( R 2  =  0.421,  P=  0.03). 


sample.  By  analyzing  the  amplitude  distribution  of  positive  droplets,  we 
were  able  to  accurately  predict  whether  the  deletion  population 
consisted  of  a  few  clonal  expansions  or  a  large  collection  of  random 
deletions  (Figs  3  and  S7).  In  this  way,  cytometric  analysis  of  the  partitions 
could  be  used  to  gather  information  about  the  size  spectrum  of  deletion 
templates  in  the  sample  without  being  subject  to  the  biases  inherent  in 
individual  cloning  or  the  costs  of  deep  sequencing.  We  believe  that  with 
further  development,  this  relationship  could  potentially  be  exploited  to 


open  new  possibilities  for  'next-generation'  PCR  technology  that  can 
dynamically  sort  and  collect  specific  amplification  products,  similar  to 
fluorescence-activated  cell  sorting  with  flow  cytometry. 

Another  advantage  of  the  3D  assay  is  its  ability  to  adjust  the  search 
parameters  to  measure  many  different  target  deletion  sets.  This  is 
achieved  by  defining  the  target  deletion  space  through  careful  choice  of 
primer  locations  and  the  restriction  enzyme.  This  is  an  important 
advantage  over  many  existing  methods  in  that  random  deletions  within  a 
target  region  can  be  analyzed  without  knowing  the  precise  breakpoints 
of  the  target  deletion  set  a  priori.  It  is  noteworthy  that  we  were  able  to 
measure  the  deletion  loads  at  both  sites  simultaneously,  given  that  the 
minor  arc  deletion  frequency  was  up  to  1 00-fold  less  than  the  major  arc. 
In  many  other  assays,  this  information  would  be  lost  to  the  dominant 
signal  of  the  clonal  expansions.  Importantly,  the  assay  is  also  neutral  with 
regard  to  random  (i.e.,  steady-state  temporal  deletions  that  occur  at  low 
frequency)  vs.  clonal  events  (i.e.,  deletions  that  have  expanded  out  of  the 
steady-state  pool  and  that  occur  at  relatively  high  frequency):  the  assay 
will  detect  all  deletions  that  fall  within  the  defined  deletion  space.  Thus, 
our  assay  is  able  to  account  for  gain  or  loss  of  steady-state  temporal 
deletions  as  well  as  clonal  expansions. 

Finally,  by  coupling  NGS  with  the  other  steps  in  3D,  we  are  able  to 
perform  high-resolution  characterization  of  millions  of  breakpoints 
within  a  single  sequencing  run.  To  demonstrate  the  utility  and  sensitivity 
of  this  assay,  we  analyzed  deletion  loads  within  the  mitochondrial 
genome  of  human  brain  samples.  For  example,  at  the  ND1/ND2  site,  we 
interrogated  over  8  billion  mitochondrial  genomes  and  identified  over 
100  000  genomes  with  a  deletion  within  our  target  region.  At  that  site, 
we  were  able  to  characterize  430  individual  unique  deletions  with  an 
average  sequencing  coverage  of  78-fold.  Furthermore,  based  on  the 
specific  sequencing  coverage,  we  were  able  to  distinguish  between 
clonally  expanded  and  random,  'steady-state  temporal'  deletions.  To  our 
knowledge,  no  other  assay  has  demonstrated  the  capability  of  identi¬ 
fying  and  analyzing  such  a  large  deletion  set  with  comparable  resolution. 

Digital  Deletion  analysis  allows  for  unbiased,  high-resolution  analysis 
of  the  full  spectrum  of  deletions  within  the  target  site.  With  this  tool,  we 
can  better  analyze  the  mechanics  and  kinetics  of  deletion  acquisition  and 
expansion  in  aging  tissue.  Accumulation  of  mtDNA  deletions,  particularly 
in  postmitotic  tissue,  is  an  important  cause  of  human  pathology  and 
aging  (Cortopassi  &  Arnheim,  1990;  Meissner  et  al.,  2008;  Vermulst 
eta/.,  2008b;  Khrapko  &  Vijg,  2009).  While  it  is  known  that  deletions 
can  accumulate  through  a  process  of  clonal  expansion  of  a  pre-existing 
pool  of  deletions,  it  is  unclear  whether  this  or  an  accelerated  rate  of  de 
novo  deletions  is  the  primary  driving  force  behind  age-related  deletion 
accumulation  (Khrapko,  2011).  Previous  studies  using  mathematical 
simulations  of  cell  division  or  analysis  of  the  distribution  of  deletions  in 
tissues  conclude  that  many  mtDNA  mutations  may  have  an  early  origin 
and  have  been  subsequently  expanded  (Brierleyef  al.,  1998;  Elson  et  al., 
2001;  Khrapko  eta/.,  2003,  2004;  Payne  eta/.,  2011).  However,  work 
from  some  of  the  same  groups  has  also  leads  to  the  opposite  conclusion 
that  mtDNA  deletions  may  be  of  late  origin  (Nicholas  eta/.,  2009).  To 
address  this  issue,  we  used  3D  to  characterize  the  absolute  deletion 
frequency  and  deletion  spectrum  of  aging  brain  tissue  at  two  regions  of 
the  mitochondrial  genome.  We  found  that  the  total  deletion  load 
increases,  but  that  the  ratio  of  unique  to  total  deletions  did  not  change 
from  younger  to  older  tissue.  Furthermore,  we  observed  little  change  in 
the  size  distribution  of  deletions  as  well  as  the  relative  pools  of  high-  and 
low-frequency  deletions  indicating  a  fairly  static  spectrum  of  diversity. 
An  important  caveat  is  that  in  the  present  work,  we  are  not  actually 
tracing  the  dynamics  of  specific  deletions  with  time,  but  are  rather 
harvesting  snapshots  of  the  deletion  burden  across  several  individuals. 
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Fig.  6  High-resolution  analysis  of  deletion  dynamics.  Density  dot  plots  showing  the  length  distribution  (y-axis)  and  relative  frequency  (point  size)  of  all  unique  deletions 
per  patient  in  the  sequenced  subset  (x-axis,  plotted  as  age).  Box  plots  and  whiskers  plots  (gray)  in  the  background  show  the  95%  confidence  interval  of  the 
unweighted  length  distribution.  The  common  deletion  site  is  shown  on  the  top  plot,  and  the  ND1/ND2  deletion  site  is  shown  on  the  bottom  plot. 


Thus,  we  cannot  rule  out  the  contribution  of  newly  acquired  deletions  to 
later  time  points.  This  Is  particularly  true  in  the  case  of  the  common 
deletion  where  the  dominance  of  a  single-deletion  species  at  this  site 
makes  it  impossible  to  determine  whether  we  are  observing  clonal 
expansion  or  rapid  re-accumulation  of  the  same  deletion.  However,  at 
the  ND1/ND2  locus,  we  were  able  to  recover  a  large  diversity  of  deletions 
without  such  site  saturation  (Figs  6  and  S6).  Thus,  within  the  time  frame 
analyzed  (aged  28-80  years),  our  data  support  the  hypothesis  that 
expansion  rather  than  generation  of  new  deletions  dominates  the  age- 
related  increase  in  deletion  load. 

The  fact  that  early  mutations  are  allowed  to  accumulate  to  significant 
levels  may  be  interpreted  as  evidence  for  some  sort  of  selective  pressure. 
Precisely  what  that  pressure  is,  however,  remains  unclear.  Our  data  show 
uniform  random  distribution  of  deletion  lengths  at  the  ND1/ND2  site 
across  all  ages.  The  absence  of  a  shift  in  the  diversity  toward 
accumulation  of  larger  deletions  argues  against  the  hypothesis  that 
smaller  mtDNA  molecules  possess  a  replicative  advantage  in  postmitotic 
cells  (Wallace,  1989;  Fukui  &  Moraes,  2009).  Our  data  are  not 
inconsistent  with  in  silico  experiments  that  predict  that  clonal  expansion 
can  result  from  random  genetic  drift  without  the  aid  of  selection  (Coller 
et  al.,  2001;  Elson  et  al.,  2001).  While  this  model  has  been  somewhat 
validated  for  point  mutations  (Durham  et  al.,  2006),  other  selective 
mechanisms  for  deletions  cannot  be  ruled  out  (de  Grey,  2009).  3D  will 
allow  us  to  perform  longitudinal  studies  that  can  trace  the  kinetics  of 
clonal  expansion  of  real  deletions  that  will  allow  us  to  better  test  the  in 
silico  models  with  data  from  living  cells. 

The  3D/NGS  data  demonstrate  that  we  now  have  the  technology  to 
perform  high-resolution  analysis  and  detailed  characterization  of 
extremely  rare  deletion  events.  Importantly,  it  also  provides  the  means 
to  begin  to  use  mtDNA  deletions  as  biomarkers  for  disease.  Although 
mtDNA  deletions  accumulate  readily  in  skeletal  muscle  and  brain  tissue, 
they  exist  at  extremely  low  levels  in  blood  and  other  rapidly  proliferating 
tissue  (DiMauro  &  Hirano,  1993).  This  has  been  a  great  hindrance  to  the 
development  of  blood-based  biomarker  assays  that  could  be  used  for 
noninvasive  screening  and  early  detection  of  mitochondrial  deletion 


diseases.  Digital  Deletion  Detection  provides  an  important  new  tool  that 
will  allow  researchers  to  better  study  the  mechanisms  of  deletion 
formation,  their  mechanisms  of  expansion,  and  their  role  in  the  etiology 
of  aging  and  disease. 

Experimental  procedures 

Human  brain  tissue 

Human  histologically  normal  brain  obtained  from  informed  patients  was 
obtained  from  the  tissue  depository  of  the  Department  of  Neurological 
Surgery  at  the  University  of  Washington.  Tissue  and  demographic 
information  was  obtained  in  accord  with  an  IRB-approved  protocol 
(Table  1). 

DNA  isolation 

To  obtain  whole  DNA  from  human  brain  tissue,  tissue  samples 
(50-250  mg)  were  immersed  in  5  mL  homogenization  medium 
(0.32  m  sucrose,  1  him  EDTA,  10  itim  Tris-HCI,  pH  7.8)  and  disrupted 
with  a  glass  Dounce-type  homogenizer.  The  homogenate  was  trans¬ 
ferred  to  a  1 5-mL  tube  and  centrifuged  at  4000  g.  The  pellet  was 
resuspended  in  3  mL  lysis  buffer  (10  itim  Tris-HCI,  pH  8.0,  1 50  itim  NaCI, 

20  itim  EDTA,  1  %  SDS,  and  0.2  mg  mL-1  proteinase  K)  and  incubated 
at  55  °C  for  3  h.  DNA  was  isolated  by  phenol-chloroform  extraction 
followed  by  isopropanol  precipitation. 

Endonucleolytic  enrichment  of  mtDNA  deletions 

Rare  deletion-bearing  molecules  were  selectively  enriched  through 
endonucleolytic  destruction  of  wild-type  target  sites.  First,  a  400  pL 
digestion  reaction  mixture  was  prepared  containing  10  pg  of  genomic 
DNA,  8  pL  (800  U)  of  Taql  (New  England  Biolabs),  and  Taql  reaction 
buffer  (Fermentas 

).  The  reaction  mixture  was  divided  into  4  x  100  pL  □ 
reactions  and  incubated  at  65  °C  for  4-6  h.  An  additional  200  U  of  Taql 
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was  added  to  each  reaction  every  hour.  After  each  Taql  addition, 
samples  were  thoroughly  mixed  and  briefly  centrifuged  to  ensure 
efficient  digestion.  Following  the  digestion  procedure,  the  reactions 
were  recombined,  extracted  once  with  phenol/chloroform/isoamyl 
alcohol  (25:24:1,  v/v),  precipitated  by  ethanol,  and  resuspended  in 
1  mM  Tris,  pH  8. 

TaqMan  probe  and  primer  design 

The  following  primer/probe  sets  were  used  with  human  total  DNA  for 
mtDNA  deletion  detection.  Control  site:  5'-CTA  AAA  ATA  TTA  AAC  ACA 
AAC  TAC  CAC  CTA  CCTC-3’  (forward  primer),  5'-GTT  CAT  TTT  GGT  TCT 
CAG  GGT  TTG  TTA  TAA-3’  (reverse  primer),  and  5'-6FAM-  CCT  CAC 
CAA  AGC  CCA  TA-MGB-3'  (probe).  ND1/ND2  site:  5'-CGC  CAC  ATC 
TAC  CAT  CACC-3'  (forward  primer),  5'-GAT  TAT  GGA  TGC  GGT  TGC 
TT-3’  (reverse  primer),  and  5’-6FAM-TTG  ATG  GCA  GCT  TCT  GT-MGB-3’ 
(probe).  Common  deletion  site:  5’-TAC  CCC  CTC  TAG  AGC  CCA 
CT-3'  (forward  primer),  5'-GAG  GAA  AGG  TAT  TCC  TGC  TAA  TGCT-3’ 
(reverse  primer),  and  5’-6FAM-TGG  CCC  ACC  ATA  ATT-MGB-3'  (probe). 

Droplet  digital  PCR 

The  final  concentration  of  digested  DNA  was  adjusted  to  yield  less  than 
-3500  positive  molecules  per  pL,  which  is  within  the  range  of  linearity  for 
the  Poisson  calculation  (Pinheiro  et  al.,  2012).  Reaction  mixtures  (25  pL) 
0  contained  ddPCR  Master  Mix  (Bio-Rad),  250  nM  TaqMan  probe,  and 
1-2  pL  of  digested  DNA  (0-2  pg  total).  Appropriate  flanking  primers 
were  added  at  either  900  nM  or  45  nM  for  the  quantification  and 
sequencing  process  pathways,  respectively  (see  Supplementary  Notes  1 
and  2).  Reaction  droplets  were  made  by  applying  20  pL  of  each  reaction 
mixture  to  a  droplet  generator  DG8  cartridge  (Bio-Rad)  for  use  in  the 
QX1 00  Droplet  Generator  (Bio-Rad).  Following  droplet  generation,  38  pL 
of  the  droplet  emulsion  was  carefully  transferred  to  a  Twin. tec  semi- 
skirted  96-well  PCR  plate  (Eppendorf),  which  was  then  heat-sealed  with  a 
pierceable  foil  sheet.  To  amplify  the  fragments,  thermal  cycling  was 
carried  out  using  the  following  protocol:  initial  denaturation  step  at  95  °C 
for  10  min,  followed  by  40  cycles  of  94  °C  for  30  s,  and  63.5  °C  for 
4  min.  The  thermally  cycled  droplets  were  either  (i)  analyzed  by  flow 
cytometry  for  fluorescence  analysis  and  quantification  of  deletion 
frequencies  (see  Methods  SI)  or  (ii)  disrupted  and  the  PCR  products 
recovered  and  sequenced  in  order  to  verify  deletions  and  characterize  the 
deletion  sites  (see  Methods  SI).  All  experiments  were  performed  in 
triplicate. 

Analysis  of  fluorescence  amplitude  and  quantification  of 
deletions 

Following  normal  thermal  cycling,  droplets  were  individually  scanned 
using  the  QX100™  Droplet  Digital™  PCR  system  (Bio-Rad).  Positive 
(deletion-bearing)  and  negative  droplets  were  distinguished  on  the  basis 
of  fluorescence  amplitude  using  a  global  threshold.  The  number  of 
mutant  genomes  per  droplet  was  calculated  automatically  by  the 
accompanying  software  (QuantaSoft,  Bio-Rad)  using  Poisson  statistics  as 
described  elsewhere  (Hindson  eta/.,  2011).  Quantification  of  deletion 
frequency  requires  ddPCR  amplification  using  two  primer  sets.  The  first 
primer  set  flanks  the  test  region  and  measures  the  concentration  of 
deletion-bearing  molecules.  The  second  primer  set  flanks  a  distant  region 
in  the  genome  that  bears  no  restriction  recognition  sites.  This  second  or 
control  set  measures  the  concentration  of  all  mtDNA  genomes.  Because 
de  novo  deletions  are  so  rare,  reactions  using  the  different  primer  sets 


must  be  run  using  different  dilutions  of  the  digested  DNA,  and  the 
results  normalized  against  the  mass  of  total  DNA  in  the  reaction. 
Deletion  frequency  is  calculated  by  taking  the  ratio  of  the  normalized 
concentrations  of  deletion-bearing  mtDNA  molecules  to  the  total 
mtDNA  molecules  screened.  Reactions  that  yielded  <  10  positive 
droplets  per  well  were  scored  conservatively  as  having  no  positives 
above  background  (Pinheiro  et  al.,  2012). 

Library  preparation  and  lllumina  sequencing 

Human  ND1/ND2  ddPCR  amplification  products  were  subjected  to 
template  conversion  as  described  in  Methods  SI.  Reactions  were  cleaned 
using  the  ZR-96  Clean  and  concentrator-5  kit  (Zymo  Research).  Template  0 
concentrations  were  calculated  using  the  Quant-iT  “  PicoGreen  dsDNA 
Assay  Kit  (Invitrogen)  following  manufacturer's  recommended  protocol.  Q 
Samples  were  then  diluted  to  0.2  ng  pL-1  in  1 0  mM  Tris,  pH  8.0,  1  mM 
EDTA  (TE).  Fragmentation,  adaptor  ligation,  and  index  ligation  were 
accomplished  using  the  Nextera  XT  DNA  Sample  Preparation  Kit 
(lllumina)  following  the  recommended  protocol. 

Because  the  common  deletion  breakpoint  is  within  1 00  bp  of  the  3’  end 
of  theamplicon,  the  normal  tagmentation  protocol  could  not  be  followed. 
Instead,  adaptors  were  added  directly  via  PCR  using  the  following  primers: 
5'-TCG  TCG  GCA GCG  TCA  GAT  GTG  TAT  AAG  AGA  CAG  NNN  NCG  TAT 
GGC  CCA  CCA  TAA  TTA  CC  (forward)  and  5’-GTC  TCG  TGG  GCT  CGG 
AGA  TGT  GTA  TAA  GAG  ACA  GNN  NNG  AGG  AAA  GGT  ATT  CCT  GCT 
AAT  GCT-3'  (reverse).  Thermal  cycling  consisted  of  an  initial  denaturation 
at  95  °C  for  1 0  min,  followed  by  8  cycles  of  94  °C  for  30  s,  58  °C  for  30  s, 
and  63.5  °Cfor4  min.  Reactions  were  cleaned  using  the  ZR-96  Clean  and 
concentrator-5  kit  (Zymo  Research)  at  concentrations  and  dilutions 
performed  above.  5  pL  of  0.2  ng  pL~’  DNA  was  mixed  with  20  pL  TD 
buffer  prior  to  PCR  amplification  in  the  Nextera  XT  DNE  Sample  Prep 
workflow.  The  rest  of  the  Nextera  XT  protocol  was  performed  according  to 
recommended  procedures.  Indexed  ND1/ND2  and  common  deletion 
fragments  were  pooled  for  all  patients  and  sequenced  using  the  MiSeq 
Personal  Sequencing  System  (lllumina)  (see  Methods  SI).  FASTQ  files  for 
each  patient  were  deposited  in  the  NCBI  Sequence  Read  Archive  (SRA) 
under  project  accession  number  SRP027401 . 

Reconstruction  experiments 

Genomic  DNA  was  isolated  from  HCT  1 16  cells,  chosen  for  its  relatively 
low  endogenous  deletion  frequency  of  1.8  x  1CT7.  Following  Taql 
digestion,  a  series  of  10-fold  serial  dilutions  of  the  genomic  DNA  were 
prepared,  ranging  over  eight  orders  of  magnitude.  A  997-bp  deletion 
was  isolated,  amplified,  and  cloned  into  a  vector  for  use  as  a  control 
molecule  (Fig.  5B).  Approximately  600  ng  of  the  3534A997  control 
plasmid  was  serially  diluted  100  million  fold  and  subjected  to  a 
preliminary  3D  analysis  to  calculate  the  absolute  concentration  of 
molecules  within  the  dilution.  To  each  of  the  genomic  dilutions,  three 
copies  of  the  3534A997  control  plasmid  were  added  per  microliter  of 
reaction.  The  reaction  mixtures  were  then  partitioned,  cycled,  and  the 
droplets  analyzed  to  determine  whether  the  small  concentration  of  the 
control  molecules  could  be  accurately  assessed  even  in  the  presence  of 
high  concentrations  of  background,  HCT  116  DNA. 

Heterogeneous  population  reconstruction  experiments 

Three  control  plasmids  (3534A997,  3719A809,  and  3871A492)  were 
isolated  from  POEGD274A  HeLa  cells  as  described  above  (see  also  Fig.  3). 
Each  plasmid  was  serially  diluted  and  subjected  to  preliminary  3D 


©  2013  John  Wiley  &  Sons  Ltd  and  The  Anatomical  Society 


Digital  deletion  detection,  S.  D.  Taylor  et  al.  9 


analysis  in  order  to  calculate  the  concentration  of  molecules  within  each 
dilution.  Based  on  these  quantifications,  300  molecules  gL-1  per  tem¬ 
plate  were  subjected  to  another  round  of  3D  analysis,  either  separately 
or  combined  into  a  single  reaction. 

Regression  analysis 

Linear  regression  analyses  were  performed  in  R  using  the  built-in  Stats 
package  (R  Core  Team,  2013).  Significance  of  linear  models  was 
calculated  using  the  F-test  against  the  null  hypothesis  of  no  correlation 
between  the  variables  tested. 
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Aging  neural  progenitor  cells  have  decreased  mitochondrial  content  and  lower  oxidative  metabolism.  Stoll  EA, 
Cheung  W,  Mikheev  AM,  Sweet  IR,  Bielas  JH,  Zhang  J,  Rostomily  RC,  Horner  PJ.  J  Biol  Chem.  201 1 
Nov;286(44):38592-601 . 

PMID:21 900249 

Roles  of  DNA  polymerase  I  in  leading  and  lagging-strand  replication  defined  by  a  high-resolution  mutation 
footprint  of  ColEI  plasmid  replication.  Allen  JM,  Simcha  DM,  Ericson  NG,  Alexander  DL,  Marquette  JT,  Van 
Biber  BP,  Troll  CJ,  Karchin  R,  Bielas  JH,  Loeb  LA,  Camps  M.  Nucleic  Acids  Res.  2011  Sep;39(16):7020-33. 
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A  random  mutation  capture  assay  to  detect  genomic  point  mutations  in  mouse  tissue.  Wright  JH,  Modjeski  KL, 
Bielas  JH,  Preston  BD,  Fausto  N,  Loeb  LA,  Campbell  JS.  Nucleic  Acids  Res.  201 1  Jun;39(1 1):e73. 
PMID:21459851 

Generation,  function,  and  prognostic  utility  of  somatic  mitochondrial  DNA  mutations  in  cancer.  Kulawiec  M, 
Salk  JJ,  Ericson  NG,  Wanagat  J,  Bielas  JH.  Environ  Mol  Mutagen.  2010  Jun;51(5):427-39. 

PMID:20544883 

Molecularly  evolved  thymidylate  synthase  inhibits  5-fluorodeoxyuridine  toxicity  in  human  hematopoietic  cells. 
Bielas  JH,  Schmitt  MW,  Icreverzi  A,  Ericson  NG,  Loeb  LA.  Hum  Gene  Ther.  2009  Dec;20(12):1 703-7. 
PMID:19694534 

Quantification  of  random  mutations  in  the  mitochondrial  genome.  Vermulst  M,  Bielas  JH,  Loeb  LA.  Methods. 
2008  Dec;46(4):263-8. 

PMID:18948200 

Genetic  instability  is  not  a  requirement  for  tumor  development.  Bodmer  W,  Bielas  JH,  Beckman  RA. 

Cancer  Res.  2008  May  15;68(10):3558-60;  discussion  3560-1. 

PMID:1 8483234 

Cancers  exhibit  a  mutator  phenotype:  clinical  implications.  Loeb  LA,  Bielas  JH,  Beckman  RA.  Cancer  Res. 
2008  May  1 5;68(1 0)  :3551  -7;  discussion  3557. 

PMID:18483233 

DNA  deletions  and  clonal  mutations  drive  premature  aging  in  mitochondrial  mutator  mice.  Vermulst  M, 
Wanagat  J,  Kujoth  GC,  Bielas  JH,  Rabinovitch  PS,  Prolla  TA,  Loeb  LA.  Nat  Genet.  2008  Apr;40(4):392-4. 
PMID:1831 1 1 39 

Mitochondrial  point  mutations  do  not  limit  the  natural  lifespan  of  mice.  Vermulst  M,  Bielas  JH,  Kujoth  GC, 
Ladiges  WC,  Rabinovitch  PS,  Prolla  TA,  Loeb  LA.  Nat  Genet.  2007  Apr;39(4):540-3.  Epub  2007  Mar  4. 
PMID:17334366 

Comment  in:  Mitochondrial  DNA  mutations  and  aging:  a  case  closed?  Khrapko  K,  Vijg  J.  Nat  Genet.  2007  Apr;39(4):445-6. 

LOH-proficient  embryonic  stem  cells:  a  model  of  cancer  progenitor  cells?  Bielas  JH,  Venkatesan  RN,  Loeb  LA. 
Trends  Genet.  2007  Apr;23(4):1 54-7. 

PMID:17328987 

Limits  to  the  Human  Cancer  Genome  Project?  Loeb  LA,  Bielas  JH.  Science.  2007  Feb  9;31 5(581 3) :762; 
author  reply  764-5. 

PMID:17297724 

Human  cancers  express  a  mutator  phenotype.  Bielas  JH,  Loeb  KR,  Rubin  BP,  True  LD,  Loeb  LA.  Proc  Natl 
Acad  Sci  U  S  A.  2006  Nov  28;1 03(48)  :1 8238-42. 

PMID:17108085 

Comment  in:  Random  mutations,  selected  mutations:  A  PIN  opens  the  door  to  new  genetic  landscapes.  Klein  CA.  Proc  Natl  Acad  Sci  USA.  2006  Nov 
28;1 03(48)  :1 8033-4. 

Non-transcribed  strand  repair  revealed  in  quiescent  cells.  Bielas  JH.  Mutagenesis.  2006  Jan;21(1):49-53. 
PMID:16394029 

Generation  of  mutator  mutants  during  carcinogenesis.  Venkatesan  RN,  Bielas  JH,  Loeb  LA.  DNA  Repair 
(Amst).  2006  Mar  7;5(3):294-302. 

PMID:16359931 

Quantification  of  random  genomic  mutations.  Bielas  JH,  Loeb  LA.  Nat  Methods.  2005  Apr;2(4):285-90. 
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Unifying  concept  of  DNA  repair:  the  polymerase  scanning  hypothesis.  Heddle  JA,  Bielas  JH.  Environ  Mol 
Mutagen.  2005  Mar-Apr;45(2-3):143-9. 

PMID:15672383 

Mutator  phenotype  in  cancer:  timing  and  perspectives.  Bielas  JH.  Loeb  LA.  Environ  Mol  Mutagen.  2005  Mar- 
Apr;45(2-3):206-13. 

PMID:15672382 

Quiescent  murine  cells  lack  global  genomic  repair  but  are  proficient  in  transcription-coupled  repair.  Bielas  JH, 
Heddle  JA.  DNA  Repair  (Amst).  2004  Jul  2;3(7):71 1-7. 

PMID:15177180 

Elevated  mutagenesis  and  decreased  DNA  repair  at  a  transgene  are  associated  with  proliferation  but  not 
apoptosis  in  p53-deficient  cells.  Bielas  JH,  Heddle  JA.  Proc  Natl  Acad  Sci  USA.  2003  Oct  28;100(22):12853- 
8. 

PMID:14569010 

A  more  efficient  Big  Blue  protocol  improves  transgene  rescue  and  accuracy  in  adduct  and  mutation 
measurement.  Bielas  JH.  Mutat  Res.  2002  Jul  25;518(2):107-12. 

PMID:1 21 1 3761 

Proliferation  is  necessary  for  both  repair  and  mutation  in  transgenic  mouse  cells.  Bielas  JH,  Heddle  JA.  Proc 
Natl  Acad  Sci  U  S  A.  2000  Oct  10;97(21):1 1391-6. 

PMID:1 1005832 

Comment  in:  DNA  damage,  DNA  repair,  cell  proliferation,  and  DNA  replication:  How  do  gene  mutations  result?  O'Neill  JP.  Proc  Natl  Acad  Sci  USA. 
2000  97(21)  11137-11139 

The  ell  locus  in  the  MutaMouse  system.  Swiger  RR,  Cosentino  L,  Shima  N,  Bielas  JH,  Cruz-Munoz  W,  Heddle 
JA.  Environ  Mol  Mutagen.  1999;34(2-3):201-7. 

PM  ID:1 0529745 

INTELLECTUAL  PROPERTY 

1 .  Bielas,  J  and  Bertout  JA.  2013.  Compositions  and  Methods  for  Accurately  Identifying  Mutations. 
PCT/US201 3/025,505,  filed  February  15,  2013,  pending. 

2.  Bielas,  J  and  Taylor  SD.  2012.  Compositions  and  Methods  for  Detecting  Rare  Nucleic  Acid  Molecule 
Mutations.  US  61/654,236,  Filed  June  1 , 2012,  pending. 

3.  Bielas,  J,  Taylor  SD,  and  Laurie  MT.  2013.  Compositions  and  Methods  for  Detecting  Rare  Nucleic  Acid 
Molecule  Mutations.  US  61/783,815,  Filed  March  14,  2013,  pending. 

4.  Bielas,  J.  2012.  Quantification  of  Adaptive  Immune  Cell  Genomes  in  a  Complex  Mixture  of  Cells. 
PCT/US201 2/061 ,193,  Filed  October  29,  2012,  pending. 

5.  Bielas,  J,  Robins  H,  and  Livingston  JR.  2012.  Quantification  of  Adaptive  Immune  Cell  Genomes  in  a 
Complex  Mixture  of  Cells.  US  13/656,265,  filed  October  19,  2012,  pending. 

6.  Bielas,  J,  2012.  Compositions  and  Methods  for  Sensitive  Mutation  Detection  in  Nucleic  Acid  Molecules. 
US61/659,837,  filed  June  14,  2012,  pending. 

INVITED  LECTURES 

1 1/2013  Mitochondrial  DNA  maintenance,  1 1th  International  Conference  on  Environmental  Mutagens 
(1 1th  ICEM),  Foz  do  Iguassu,  PR,  Brazil 
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10/2013 

09/2013 

08/2013 

05/2013 

03/2013 

12/2012 

10/2012 

10/2012 

09/2012 

07/2012 

05/2012 

02/2012 

02/2012 
08/201 1 

06/201 1 
07/201 1 
10/2010 
09/2010 
08/2009 
07/2009 
09/2008 

10/2008 


Mechanism  and  Clinical  Utility  of  Nuclear  and  Mitochondrial  DNA  Mutations  in  Cancer,  5th 
Meeting  on  Fundamental  Aspects  of  DNA  Repair  and  Mutagenesis,  University  of  Sao  Paulo, 

Sao  Paulo,  Brazil 

Ultra-sensitive  Detection  of  Nuclear  and  Mitochondrial  DNA  Mutations,  Mutation  Detection 
Workshop,  15th  International  Conference  on  Chronic  Myeloid  Leukemia:  Biology  and  Therapy, 
Estoril,  Portugal 

Mitochondrial  DNA  Mutagenesis:  Insight  into  Human  Aging,  Carcinogenesis,  and  Novel 
Anticancer  Therapies,  Ellison  Medical  Foundation  Colloquium  on  the  Biology  of  Aging,  Woods 
Hole,  MA 

Mechanism  and  Clinical  Utility  of  Nuclear  and  Mitochondrial  DNA  Mutations  in  Cancer, 
Biochemistry  and  Molecular  Biology,  Faculty  of  Medicine,  Dalhousie  University,  Halifax,  NS, 
Canada 

Mechanism  and  Clinical  Utility  of  Nuclear  and  Mitochondrial  DNA  Mutations  in  Cancer,  Irish 
Association  for  Cancer  Research  Annual  Meeting,  Dublin,  Ireland 

Digital  Detection  of  Rare  Mutations,  First  Annual  Droplet  Digital  PCR  User  Meet,  Boston,  MA 
Digital  Detection  of  Rare  Mutations  and  Tumor  Infiltrating  T  Cells:  Clinical  Application  in  Cancer 
and  Disease,  Digital  PCR  Applicationas  and  Advances,  Cambridge  Healthtech  Institute,  San 
Diego,  CA 

Mechanism  and  Clinical  Utility  of  Nuclear  and  Mitochondrial  DNA  Mutations  in  Cancer,  External 
Scientific  Advisory  Board  Meeting,  Fred  Hutchison  and  University  of  Washington  Cancer 
Consortium,  Seattle,  WA 

Metabolism  and  Mitochondrial  Mutagenesis  in  Human  Colorectal  Cancer,  Metabolism  and 
Metabolites  Symposium,  Fred  Hutchinson  Cancer  Research  Center,  Seattle,  WA 
Nuclear  and  Mitochondrial  DNA  Mutations:  Mechanisms  and  Disease,  Meeting  of  the 
Outstanding  New  Environmental  Scientist  (ONES)  Grantee  Forum,  NIEHS,  Research  Triangle 
Park,  NC 

Mutant  Nuclear  and  Mitochondrial  DNA-Based  Biomarker  Discovery:  Validation  for  the 
Detection,  Prognosis  and  Treatment  of  Cancer,  Disease  Biomarkers  Conference,  London, 
England 

Digital  Detection  of  Ultra-Rare  Nuclear  and  Mitochondrial  Mutations:  Fundamental  and  Clinical 
Implications  in  Cancer  and  Disease,  Digital  PCR  Short  Course,  Molecular  Med  Tri-Con  2012, 
Cambridge  Healthtech  Institute,  San  Francisco,  CA 

Finding  the  Needle  in  the  Haystack,  The  Institute  For  Prostate  Cancer  Research  Symposium, 
Fred  Hutchinson  Cancer  Research  Center,  Seattle,  WA 

Ultra-Sensitive  Mutant  Nuclear  and  Mitochondrial  DNA-Based  Biomarker  Discovery:  Validation 
for  the  Detection,  Prognosis  and  Treatment  of  Cancer,  QuantaLife  Droplet  Digital  PCR  Road 
show,  Boston,  MA,  Potomac,  MD,  New  York,  NY,  Chicago,  IL,  Seattle,  WA,  San  Diego,  CA,  San 
Francisco,  CA 

The  Mechanism  and  Clinical  Utility  of  Somatic  Mitochondrial  Mutagenesis  in  Cancer,  1 1th 
International  Symposium  on  Mutations  in  the  Genome,  Santorini,  Greece 
Mechanisms  of  Environmental  Mitochondrial  Mutagenesis,  Meeting  of  the  Outstanding  New 
Environmental  Scientist  (ONES)  Grantee  Forum,  NIEHS,  Research  Triangle  Park,  NC 
Mutagenesis  in  Mitochondrial  Genome,  Environmental  Mutagen  Society  Meeting,  Complex 
Systems  in  Biology  and  Risk  Assessment,  Fort  Worth,  TX 

Somatic  Mitochondrial  DNA  Mutations  in  Aging  and  Disease:  Cause  and  Consequences,  Center 
for  Exercise  Science  Seminar  Series,  University  of  Florida 

The  mechanism  and  clinical  utility  of  somatic  mitochondrial  mutagenesis  in  cancer,  10th 
International  Conference  on  Environmental  Mutagens  (ICEM),  Florence,  Italy 
Mitochondrial  DNA  mutations  as  biomarkers.  PNW  Prostate  Cancer  SPORE  Retreat, 

Vancouver,  BC 

Mutations  in  Cancer  Evolution.  36th  Annual  Association  of  Graduate  Students  in  Biological 
Sciences  (AGSBS)  Symposium.  The  Evolution  of  Biology  at  York:  Past,  Present,  and  Future. 
York  University’s  50th  Anniversary 

Nuclear  and  Mitochondrial  Mutations  in  Cancer.  Environmental  Mutagen  Society’s  39th  Annual 
Meeting,  Puerto  Rico 
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1 1/2007  Human  cancers  exhibit  point  mutation  instability  (PIN).  Making  Connections:  A  Canadian 
Cancer  Research  Conference  Celebrating  the  National  Cancer  Institute  of  Canada’s  60th 
Anniversary,  Toronto,  ON 

10/2007  Proliferation,  mutation  and  cancer.  The  Mayo  Clinic  Mutation,  Recombination,  Diversity  and 
Adaptation  Workshop,  Rochester,  MN 

08/2007  Novel  mitochondrial  and  nuclear  genomic  biomarkers  of  mutagenic  environmental  exposure. 
Health  Canada,  Ottawa,  ON 

06/2007  Random  mutations  and  cancer:  Fundamental  and  clinical  implications.  Children's  Memorial 
Research  Center  at  Northwestern  University  School  of  Medicine,  Chicago,  IL 

05/2007  Random  mutations  and  cancer:  Fundamental  and  clinical  implications.  University  of  Kentucky 
College  of  Medicine,  Lexington,  KY 

04/2007  Spontaneous  random  mutation  and  cancer.  The  Fred  Hutchinson  Cancer  Research  Center 
Radiation  Biology  Seminar  Series,  Seattle,  WA 

09/2006  Human  cancers  exhibit  a  mutator  phenotype.  Environmental  Mutagen  Society’s  37th  Annual 
Meeting,  Vancouver,  BC 

09/2006  Director,  Mutational  analysis  of  cancer  workshop.  University  of  Washington  Department  of 
Pathology  Retreat,  Leavenworth,  WA 

07/2006  Evolution  and  selection  of  enzymes  for  myeloprotection.  University  of  Washington  Gene 
Therapy  Seminar  Series,  Seattle,  WA 

03/2006  Spontaneous  mutagenesis  and  cancer.  Southern  Alberta  Cancer  Research  Institute,  Calgary, 
AB 

03/2006  Human  cancers  exhibit  an  elevated  frequency  of  random  point  mutations.  Gordon  Research 
Conferences  on  DNA  Damage,  Mutation  and  Cancer,  Ventura,  CA 

09/2005  A  novel  method  to  quantify  extremely  rare  random  genomic  mutations.  International  Conference 
on  Environmental  Mutagens,  San  Francisco,  CA 

09/2002  Measuring  the  rate  of  DNA  repair  and  mutation  in  mammalian  cells.  Midwest  Research  Institute, 
Palm  Bay,  FL 

RESEARCH  SUPPORT 

Pending  Research  Support 


04/01/2014-03/31/2019 

$1,250,000 


Tracking  #GRANT1 14181 10  Bielas  (PI) 

NIH 

Mitochondrial  DNA  Mutagenesis  in  Cancer  Prognosis  and  Treatment 

Mutations  in  both  the  nuclear  and  mitochondrial  DNA  (mtDNA)  are  believed  to  play  a  role  in  tumor  growth  and 
metastasis.  It  is  the  goal  of  this  proposal  to  delineate  the  relationships  among  mitochondrial  mutagenesis,  cell 
metabolism,  and  cancer.  Successful  completion  of  this  project  will  result  in  novel  chemotherapeutic  strategies 
and  improved  patient  prognosis,  thereby  improving  patient  survival  and  quality  of  life. 


Ongoing  Research  Support 

AG-NS-0577-09  Bielas  (PI)  07/27/2009  -  07/26/201 3 

Ellison  Medical  Foundation  $400,000 

Mechanisms  of  Human  Mitochondrial  Mutagenesis  in  Aging  and  Disease 

The  goal  of  this  project  is  investigate  the  relationship  between  mitochondrial  DNA  mutations  and  aging;  and  by 
extension  explore  methods  that  prevent  and/or  slow  the  accumulation  of  random  mtDNA  mutations,  age- 
related  debilitation  and  disease. 

Role:  PI 


W81XWH-1 0-1 -0563  Bielas  (PI)  07/19/2010-07/18/2013 

CDMRP/Department  of  Defense  $225,000 

Mitochondrial  DNA  Biomarker  Discovery  and  Validation  for  the  Detection,  Prognosis,  and  Treatment  of 
Prostate  Cancer 

The  goal  of  this  project  is  to  establish  whether  the  prevalence  of  circulating  tumor  mtDNA,  marked  by  somatic 
mtDNA  mutations,  can  serve  as  a  sensitive  marker  of  clinical  stage,  progression,  and  recurrence  in  prostate 
cancer. 
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Role:  PI 


1  R01  ES019319  Bielas  (PI)  09/01/2010-04/30/2015 

NIEHS  $1,625,000 

Mechanisms  of  Environmental  and  Nuclear  and  Mitochondrial  Mutagenesis 

The  goal  of  this  proposal  is  to  determine  the  molecular  mechanisms  of  somatic  mtDNA  mutagenesis 
associated  with  DNA  damaging  agents  and  disease. 

Role:  PI 


Completed  Research  Support 

Gregory  Fund  Bielas  (PI) 

Listwin  Foundation 

Digital  Quantification  of  Tumor  Infiltrating  T-lymphocytes  (TIL) 

The  goal  of  this  project  is  the  development  of  a  digital  assay  to  measure  the  number  and  clonality  of  TIL  within 
a  solid  tumor 
Role:  PI 


04/01/2012-03/31/2013 

$105,600 


Campbell  (PI) 


1  R01  CA127228-01A1 
NIH/NCI 

Mechanisms  of  PDGF-C  induced  Hepatocellular  Carcinogenesis 

Determine  pathways  that  regulate  fibrogenesis,  angiogenesis,  and  liver  tumorigenesis. 
Role:  Collaborator 


12/01/2007-07/31/2013 


Bid  &  Proposal  Bielas  (PI)  07/01/201 1  -  06/30/2012 

FHCRC  $10,000 

Digital  Quantification  of  Tumor  Infiltrating  T-lymphocytes  (TIL) 

The  ultimate  goal  of  this  pilot  is  to  generate  sufficient  background  data  to  be  competitive  for  an  R01  to  fund  the 
continuation  of  our  work  to  develop  a  digital  assay  to  measure  the  number  and  clonality  of  TIL  within  a  solid 
tumor. 

Role:  PI 

FHCRC206658  Bielas  (PI)  09/01/2008-06/30/2012 

Fred  Hutchinson  Cancer  Research  Center 

Bielas  Lab  New  Development  Support 

Funding  provided  to  new  appointed  faculty  at  the  Assistant  Member  Level  to  support  the  development  of  their 
scientific  research  environment.  Funds  are  used  to  cover  the  operations  of  the  lab. 

Role:  PI 

P30CA015704  Hartwell  (PI)  01/01/2010-  12/31/2010 

NIH/NCI 

Cancer  Center  Support  Grant 

Pilot  Study  (Syrjala):  Mechanisms  for  Persistent  Skeletal  Muscle  Dysfuntion  After  Cancer  Treatment  in 
Mouse  and  Human  Models 

The  goal  of  this  study  is  to  test  an  animal  model  of  short  and  long-term  mitochondrial  damage  resulting  from  an 
alkylating  and  an  anthracycline  agent,  both  widely  used  in  cancer  chemotherapies. 

Role:  Collaborator 


P30CA015704  Hartwell  (PI) 

NIH/NCI 

Cancer  Center  Support  Grant 

Pilot  Project:  Novel  ultra  sensitive  DNA-based  cancer  prostate  markers 

Our  goal  is  analyze  the  prevalence  of  homoplasmic  mutations  in  prostate  cancer. 
Role:  Pilot  Project  PI 


01/01/2009-  12/31/2010 
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LAB  MEMBERS 


Current 

04/2013- 

07/2013- 

09/2011- 

06/2010- 

11/2008- 

Haley  BrinJones,  Lab  Aide 

William  Valente,  B.S.,  Medical  Scientist  Train  Program  (MD/PhD  Candidate) 

Jessica  Bertout,  V.M.D.,  Ph.D.,  Post-Doctoral  Research  Fellow 

Sean  Taylor,  Ph.D.,  Post-Doctoral  Research  Fellow 

Nolan  Ericson  B.S.,  Research  Technician  and  Lab  Manager 

Rotation  Students 

Molecular  Cellular  Biology  (MCB): 

09/201 2-1 2/201 2  Ethan  Ahler,  B.S. 

09/201 1  -1 2/201 2  Andrew  Mathewson,  B.S 

03/2010-06/2010  Sam  Lancaster,  B.S. 

Medical  Scientist  Train  Program  (MSTP): 
06/201 2-08/201 2  William  Valente,  B.S. 


Past 

09/2009-07/2013 

03/2010-07/2013 

06/2011-08/2011 

Mariola  Kulawiec,  Ph.D.,  Post-Doctoral  Research  Fellow 

Mathew  Laurie,  B.S.,  Research  Technician 

Tyler  Gable,  Summer  Undergraduate  Research  Program  Intern,  Eastern  Washington 
University 

01/2009-05/2010 

08/2009-10/2009 

01/2009-07/2009 

09/2008-03/2009 

Dorothy  Park,  B.S.,  Lab  Aide 

Katja  Schmalbach,  M.A.  trainee,  Host  Ph.D.  student,  University  of  Wurzburg 

Danielle  Harden,  M.A.,  Development  Consultant 

Scott  Paulson  B.S.,  Project  coordinator 
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